Failures and breakthroughs – exposed, reflected, considered

Bitcoin: how many exist, lost and its quantum computing future

leave a comment »

Let’s start by setting up a context of just how much it costs to verify one Bitcoin transaction. A report on Motherboard recently calculated that the cost to verify 1 Bitcoin transaction is as much electricity as the daily consumption of 1.6 American Households. Bitcoin network may consume up to 14 Gigawatts of electricity (equivalent to electricity consumption of Denmark) by 2020 with a low estimate of 0.5GW.

There is much written about theft of Bitcoin, as people are exposed to cyber criminals, but there are also instances where people are losing their coins. In case of loss, it’s almost always impossible to recover lost Bitcoins. They then remain in the blockchain, like any other Bitcoin, but are inaccessible because it’s impossible to find private keys that would allow them to be spent again.

Bitcoin can be lost or destroyed through the following actions:

Sometimes, not only individuals but also experienced companies make big mistakes and loose their Bitcoins. For example, Bitomat lost private keys to 17,000 of their customers’ Bitcoins. Parity lost $300m of cryptocurrency  due to several bugs. And most recently, more than $500 million worth of digital coins were stolen from Coincheck.

Lot Bitcoin losses also come from Bitcoin’s earliest days, when mining rewards were 50 Bitcoins a block, and Bitcoin was trading at less than 1 cent. At that time, many  didn’t care if they lost their (private) keys or just forgot about them; this guys threw away his hard drive containing 7500 Bitcoins.

Let’s briefly analyse Bitcoin’s creation and increase of supply. The theoretical total number of Bitcoins is 21 million. Hence, Bitcoin has a controlled supply. Bitcoin protocol is designed in such a way that new Bitcoins are created at a decreasing and predictable rate. Each year, number of new Bitcoins created is automatically halved until Bitcoin issuance halts completely with a total of 21 million Bitcoins in existence.

While the number of Bitcoins in existence will never exceed 21 million, the money supply of Bitcoin can exceed 21 million due to fractional-reserve banking.

Screen Shot 2018-02-09 at 6.04.08 PM

Source: en.bitcoin.it

As of June 23, 2017, Bitcoin has reached a total circulation amount of 16.4 million Bitcoins, which is about 81,25% of the total amount of 21 million Bitcoins.

2017 research by Chainanalysis showed that between 2.78 million and 3.79 million Bitcoins are already lost or 17% – 23% of what’s been mined to date.

Screen Shot 2018-02-09 at 6.41.15 PM

How much Bitcoin exactly has been lost? It’s a pretty tough question considering there is no definitive metric for finding the answer. A good estimate is around 25% of all Bitcoin, according to this analysis (this research concludes 30% of all coins had been lost, equating to 25% of all coins when adjusted for the current amount of coins in circulation, which can be done as bulk of lost Bitcoins originate from very early and as Bitcoin’s value has been going up, people lose their coins at a slower rate).

With advent of quantum computers, future of Bitcoin might be perilous. One researcher suggested that quantum computers can calculate the private key from the public one in a minute or two. By learning all the private keys, someone would have access to all available bitcoin. However, a more extensive research shows that in short term, impact of quantum computers will appear to be rather small for mining, security and forking aspects of Bitcoin.

It’s possible that an arms race between quantum hackers and quantum Bitcoin creators will take place. There is an initiative that already tested a feasibility of quantum-safe blockchain platform utilizing quantum key distribution across an urban fiber network.

The below image shows encryption algorithms vulnerable and secure for quantum computing.

Screen Shot 2018-02-15 at 12.17.48 PM

Source:  cryptomorrow.com

And while work is still ongoing, three quantum-secure methods have been proposed as alternative encryption methodologies for the quantum computing age: lattice-based cryptography, code-based cryptography, multivariate cryptography. IOTA already  deploys Winternitz One-Time Signature (OTS) scheme using Lamport signatures, claiming to be resistant to quantum computer algorithms if they have large hash functions.

The no-cloning theorem will make it impossible to copy and distribute a decentralized ledger of qubits (quantum units of information). As qubits can’t be copied or non-destructively read, they will act more like real coins (no issue of double-spending). Quantum Bitcoin miners might support the network by doing operations which amount to quantum error correction (which might replace current Proof-of-Work or Proof-of-Stake systems) as the use of quantum entanglement will enable all network participants to simultaneously agree on a measurement result without a proof of work system.

And while we are waiting for quantum-era Satoshi to rise, check out this THEORETICAL account of how quantum computers may potentially create Bitcoin, which also contains primers on quantum computers and Bitcoin mining.

P.S. Satoshi is estimated to be in the possession of over one million coins

View story at Medium.com

 

Advertisements

How GANs can turn AI into a massive force

leave a comment »

ai-robot-face

 

Deep learning models can already achieve state-of-the-art results in some applications, but their capabilities are still limited. Unlike humans, deep learning models are unable to handle minor changes, and hence can only be applied for specific and narrowly defined tasks.

Consider this conversation of what might be the most sophisticated negotiation software on the planet, which occurred between two AI agents developed at Facebook:

Bob: “I can can I I everything else.”

Alice: “Balls have zero to me to me to me to me to me to me to me to me to.”

At first, they were speaking in plain old English, but researchers realized they forgot to include a reward for sticking to the language. So, the AI agents began to diverge, eventually rearranging legible words into seemingly nonsensical (but, in their perspective, highly efficient) sentences. They invented their own codewords, abbreviations, and structures.

This phenomenon is observed again and again and again.

A vanguard AI technology that can learn, recognize, and generate information on a nearly human level doesn’t exist yet, but we have taken steps toward that direction.

What are generative adversarial networks (GANs)?

Generally intelligent systems must be able to generalize from limited data and learning causal relationships. In 2016, Ian Goodfellow, a fellow at Google Brain, suggested using generative adversarial networks (GANs) as an alternative unsupervised machine learning method. This aimed to address many of the ailing points of the existing methods.

GANs consist of two deep neural networks: generator and discriminator. The generator’s goal is to create data samples that are so indistinguishable to the real ones. The discriminator’s goal is to identify which of the generator’s data samples are real and which are fake.

These two networks compete against each other in a zero-sum game (i.e. one’s loss implies another’s win). Both networks would then become stronger in a relatively short period of time.

gan-chart

Backpropagation is used to update the model parameters and train the neural networks. Over time, the networks learn many features of the provided data. To create realistic forged samples, the generator needs to learn the data’s features and patterns, while the discriminator does the same to correctly distinguish between real and fake samples.

GANs are thus able to overcome the above weaknesses by training (i.e. playing) neural networks against each other, thus learning from each other (which necessitates less data) and eventually performing better in a broader range of problems.

Applications of GANs

There are several types of GANs, and some of its most obvious applications include high-resolution or interactive image generation/blendingimage inpaintingimage-to-image translation, abstract reasoning, semantic segmentation, video generation, and text-to-image synthesis, among others.

The video game industry is the first area of entertainment to start seriously experimentingusing AI to generate raw content. There’s a huge cost incentive to invest in video game development automation given the US$300 million+ budget of modern AAA video games.

GANs have also been used for text, with less success⏤a bot developed to speak like Friedrich Nietzsche started to speak in a manner similar to the philosopher, but the sentences did not make sense. GANs for voice applications are able to reproduce a given text string to life-like voices with approximately 20 minutes of voice samples, such as these popular impersonations of American presidents Donald Trump and Barack Obama. In the near future, videos can likely be generated just by providing a script.

Goodfellow and his colleagues used GANs for image generation, recognition, and classification by teaching one of the networks to create images of handwritten digits (humans were not able to distinguish real handwritten digits). They also trained a neural network to create images of objects, which humans could only differentiate (from real ones) 78.7 percent of the time. Below are some sample images of faces created entirely by deep convolutional GANs.
face-samples-gan

Despite all the above achievements, GANs still have weaknesses:

  • Instability (the generator and the discriminator losses keep oscillating) and non-convergence (to optimum) of the objective function in GANs
  • Mode collapse (this happens when the generator doesn’t produce diverse images or information)
  • The possibility that either the generator or the discriminator becomes too strong as compared to the others during training
  • The possibility that either the generator or the discriminator never learns beyond a certain point

An existential threat

Do GANs and AI in general pose an existential threat to humanity? Elon Musk thinks so. Since 2014, he has been advocating adoption of AI regulations by authorities around the world. Recently, he reiterated the urgent need to be proactive in regulation.

“AI is a fundamental risk to the existence of human civilization,” Musk tells US politicians recently.

His concerns stem from the rapid developments related to GANs, which might push humanity toward the inception of artificial general intelligence. While AI regulations may serve as safeguards, AI is still far from the fictitious depictions seen frequently in Hollywood sci-fi movies.

(By the way, Facebook ultimately opted to require its negotiation bots to speak in plain old English.)

Here are some recommended resources for GAN:

This article originally appeared on Tech in Asia.

Written by Hayk

January 26, 2018 at 7:47 pm

Brief overview: neural networks, architectures, frameworks

with 2 comments

Deep learning is a new name for an approach to AI called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural networks were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who became co-directors of the new MIT Artificial Intelligence Laboratory in 1970.

Neural networks are a means of doing machine learning, in which a computer learns to perform specific tasks by analysing training examples. Usually, these examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modelled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organised into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

Architecture and main types of neural networks

A typical neural network contains a large number of artificial neurons called units arranged in a series of layers.

  • Input layer  contains units (artificial neurons) which receive input from the outside world on which network will learn, recognise about or otherwise process.
  • Output layer  contains units that respond to the information about how it learned a task.
  • Hidden layers  are situated between input and output layers. Their task is to transform the input into something that output unit can use in some way.
  • Perceptron  has two input units and one output unit with no hidden layers, and is also called single layer perceptron.
  • Radial Basis Function Network  are similar to the feed-forward neural network except radial basis function is used as activation function of these neurons.
  • Multilayer Perceptron  networks use more than one hidden layer of neurons. These are also known as deep feed-forward neural networks.
  • Recurrent Neural Network’s (RNN) hidden layer neurons have self-connections and thus possess memory. LSTM is a type of RNN.
  • Hopfield Network is a fully interconnected network of neurons in which each neuron is connected to every other neuron. The network is trained with input pattern by setting a value of neurons to the desired pattern. Then its weights are computed. The weights are not changed. Once trained for one or more patterns, the network will converge to the learned patterns.
  • Boltzmann Machine Network  are similar to Hopfield network except some neurons are for input, while others are hidden. The weights are initialized randomly and learn through back-propagation algorithm.
  • Convolutional Neural Network(CNN) derives its name from the “convolution” operator. The primary purpose of Convolution in case is to extract features from an input image/video. Convolution preserves the spatial relationship between pixels by learning about image/video features using small squares of input data.

Of these, let’s have a very brief review of CNNs and RNNs, as these are the most commonly used.

CNN

  1. CNNs are ideal for image and video processing.
  2. CNN takes a fixed size input and generate fixed-size outputs.
  3. Use CNNs to break a component (image/video) into subcomponents (lines, curves, etc.).
  4. CNN is a type of feed-forward artificial neural network – variation of multilayer perceptrons, which are designed to use minimal amounts of preprocessing.
  5. CNNs use connectivity pattern between its neurons as inspired by the organization of the animal visual cortex, whose neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.
  6. CNN looks for the same patterns on all the different subfields of the image/video.

RNN

  1. RNNs are ideal for text and speech analysis.
  2. RNN can handle arbitrary input/output lengths.
  3. Use RNNs to create combinations of subcomponents (image captioning, text generation, language translation, etc.)
  4. RNN, unlike feedforward neural networks, can use its internal memory to process arbitrary sequences of inputs.
  5. RNNs use time-series information, i.e. what is last done will impact what done next.
  6. RNN, in the simplest case, feed hidden layers from the previous step as an additional input into the next step and while it builds up memory in this process, it is not looking for the same patterns.

A type of RNN are LSTM and GRU. The key difference between GRU and LSTM is that a GRU has two gates (reset and update) whereas an LSTM has three gates (inputoutput and forget). GRU is similar to LSTM in that both utilise gating information to solve vanishing gradient problem. GRU’s performance is on par with LSTM, but computationally more efficient.

  • GRUs train faster and perform better than LSTMs on less training data if used for language modelling.
  • GRUs are simpler and easier to modify, for example adding new gates in case of additional input to the network.
  • In theory, LSTMs remember longer sequences than GRUs and outperform them in tasks requiring modelling long-distance relations.
  • GRUs expose complete memory, unlike LSTM
  • It’s recommended to train both GRU and LSTM and see which is better.

Deep learning frameworks

There are several frameworks that provide advanced AI/ML capabilities. How do you determine which framework is best for you?

The below figure summarises most of the popular open source deep network repositories. The ranking is based on the number of stars awarded by developers in GitHub (as of May 2017).

deep learning frameworks ranked via GitHub

Google’s TensorFlow is a library developed at Google Brain. TensorFlow supports a broad set of capabilities such as image, handwriting and speech recognition, forecasting and natural language processing (NLP). Its programming interfaces includes Python and C++ and alpha releases of Java, GO, R, and Haskell API will soon be supported.

Caffe is the brainchild of Yangqing Jia who leads engineering for Facebook AI. Caffe is the first mainstream industry-grade deep learning toolkit, started in late 2013. Due to its excellent convolutional model, it is one of the most popular toolkits within the computer vision community. Speed makes Caffe perfect for research experiments and commercial deployment. However, it does not support fine granularity network layers like those found in TensorFlow and Theano. Caffe can process over 60M images per day with a single Nvidia K40 GPU. It’s cross-platform and supports C++, Matlab and Python programming interfaces and has a large user community that contributes to their own repository known as “Model Zoo.” AlexNet and GoogleNet are two popular user-made networks available to the community.

Caffe 2 was unveiled in April 2017 and is focused on being modular and excelling at mobile and at large scale deployments. Like TensorFlow, Caffe 2 will support ARM architecture using the C++ Eigen library and continue offering strong support for vision-related problems, also adding in RNN and LSTM networks for NLP, handwriting recognition, and time series forecasting.

MXNet is a fully featured, programmable and scalable deep learning framework, which offers the ability to both mix programming models (imperative and declarative) and code in Python, C++, R, Scala, Julia, Matlab and JavaScript. MXNet supports CNN and RNN, including LTSM networks and provides excellent capabilities for imaging, handwriting and speech recognition, forecasting and NLP. It’s considered the world’s best image classifier, and supports GAN simulations. This model is used in Nash equilibrium to perform experimental economics methods. Amazon supports MXNet, planning to use it in existing and upcoming services whereas Apple is rumorred to be also using it.

Theano architecture lacks the elegance of TensorFlow, but provides capabilities like symbolic API supports looping control, so-called scan, which makes implementing RNNs easy and efficient. Theano supports many types of convolutions for hand writing and image classification including medical images. Theano uses 3D convolution/pooling for video classification. It can process natural language processing tasks, including language understanding, translation, and generation. Theano supports GAN.

 

How to conduct Initial Coin Offer (ICO) – the checklist

leave a comment »

DISCLAIMER: This is a perpetual WORK-IN-PROGRESS and thus doesn’t claim to be comprehensive but rather to serve as a guide. We welcome any feedback, especially suggestions for improvement from companies who have done an (successful) ICO. Suggested approaches and numbers in the checklist are not carved in stone/truth but guidelines. Lastly, information (names of people, entities, numbers) not present in the checklist will be shared only based on explicit interests and requests on case by case basis. USE all the info below and in the checklist at your own risk and for your benefit and guidance.

Context and mania

The amount of money being raised through Initial Coin Offerings (ICO) has quintupled since May 2017. The four largest ICOs to date – Filecoin ($206M), Tezos ($232M), EOS ($180M), and Bancor ($154M) – have raised $772 million between them. We are experiencing a bubble, but not as crazy when compared to $8 trillion over market capitalisation during the dot.com era. With proliferation of ICOs and tokens, era of zombie tokens is also upon us. You can check new and ongoing ICOs rated here.

cumulative_ico-1-1

Coindesk: Over $3.5 billion dollars have been raised to date via ICOs

It was a hot summer with $462M raised in June 2017, $575M in July 2017, and the peak was reached in September 2017 with a whopping $663M of ICO funding.

ICO regulations are coming .. and the checklist

ICO mania started cooling after September 4, 2017 when the People’s Bank of China placed a temporary ban on ICOs.

In view of ICO and blockchain mania, SEC has issued guidelines and statements. SEC has already charged two ICOs with fraud. Tezos has been hit with two class action lawsuits.  Singapore’s MAS and Malaysia’s SC have already highlighted risks and issued preliminary guidelines related to ICOs. Other regulators will also be tightening up compliance and regulatory guidelines further in next few months. Projects such as SAFT (Simple Agreement for Future Tokens) help navigate US laws.

OK, so there are six main aspects to an Initial Coin Offering:

  1. Team/Advisors
  2. Technology
  3. Product/Platform
  4. Business Model
  5. Legal/Regulation
  6. Marketing/Roadshow and Investor Relations

And most companies differentiate between pre-ICO, ICO and post-ICO stages of activities.

With the above points in mind, here is a draft ICO checklist. Use, benefit and be successful!

Note: This ICO checklist was created in collaboration with Nikita Akimov whose current platform has 1.2 million MAUs and is currently doing its ICO.

P.S. Based on type of business/product/platform, I might be able to share a list of crypto funds and investors.

Written by Hayk

December 26, 2017 at 9:11 am

How AI defeated top poker players

leave a comment »

Poker is a game with imperfect information. Imperfect-information games model settings where players have private information. Huge progress has been made in solving such games over the past 20 years, especially since the Annual Computer Poker Competition was established in 2006.  Before 2006, general-purpose linear programming solvers (example) and sequence-form representation (example) were used to solve small variants of poker or coarse abstractions of two-player limit Texas Hold’em.

Since 2006, two more scalable equilibrium-finding algorithms and problem representations have been developed for two-player zero-sum games. One family is based on smoothed gradient descent algorithms and a decomposed problem representation. The other family, counterfactual regret minimisation (CFR), is based on a form of self-play using no-regret learning, adapted so that regret updates can be computed at each information set separately, instead of requiring regrets to be updated for entire game strategies.

Best available guarantees for CFR require ~1/ε 2 iterations over the game tree to reach an ε-equilibrium, that is, strategies for players such that no player can be exploited by more than ε by any strategy. The gradient-based algorithms require only ~1/ε or ~log(1/ε) iterations. The latter approach matches the optimal number of iterations required. On the other hand, more effective sampling techniques have been developed for CFR than for the gradient-based algorithms, so quick approximate iterations can be used.

How to solve imperfect-information games

Currently, the main approach for solving imperfect-information games is shown in the image below. First, the game is abstracted to generate a smaller but strategically similar game, reducing it to a size that can be tackled with an equilibrium finding algorithm.

Then, the abstract game is solved for equilibrium or near-equilibrium. Nash equilibrium defines a notion of rational play, i.e. it’s a profile of strategies, one per player, such that no player can increase his/her expected payoff by switching to a different strategy. A strategy for a player states for each information set where it is the player’s turn, the probability with which the player should select each of his/her available actions.

An information set is a collection of game states that cannot be distinguished by the player whose turn it is because of private information of other players. Finally, the strategies from the abstract game are mapped back to the original game.

 

Science Magazine

 

Two main kinds of abstraction are used. One is information abstraction, where it is assumed in the abstract game that a player does not know some information that he/she actually knows. Lossless abstraction algorithms yield an abstract game from which each equilibrium is also an equilibrium in the original game, and typically reduce the size of poker (or other such) games by 1-2 orders of magnitude.

The second method, action abstraction, removes some actions from consideration in the abstract game, and is useful when the number of actions that a player can choose is large.

Libratus vs. top poker players

Previously, AI has beaten chess, checkers, Go, Jeopardy but managed to beat poker only in January 2017. Unlike chess or Go, poker is a game of imperfect-information and requires a different methodology to tackle it.

In a 20-day competition involving 120,000 hands at Rivers Casino in Pittsburgh during January 2017, Libratus became the first AI to defeat top human players at Heads-up no-limit Texas Hold’em—the primary benchmark and long-standing challenge problem for imperfect-information game-solving by AIs.

Libratus beat a team of four top poker professionals in Heads-up no-limit Texas hold’em, which has 6.38 × 10161 decision points. It played with each player a two-player game and collectively amassed about $1.8 million in chips. It used the above-mentioned approach of simplifying and abstracting the game, then finding an equilibrium followed by mapping the abstract game back to the original one while adding details and improving the overall strategy. Libratus includes three main parts:

  1. Algorithm for computing (an approximate Nash equilibrium) a blueprint for the overall strategy of smaller and simpler play, using a precomputed decision tree of about 1013 decision points, instead of 10161 points in the usual game. So it starts with a simple weighted decision tree from which to select its moves depending on its hole cards and those on the board. One example of these simpler abstractions is grouping and treating similarly hands such as King-high flush and a Queen-high flush or bets of $100 or $105.
  2. Algorithm that fleshes out the details of the strategy for earlier subgames that are reached or realised during a play, and a coarse strategy for the later rounds based on assumed realization of the earlier ones. Whenever an opponent makes a move that is not in the abstraction, the module computes a solution to this subgame that includes the opponent’s move.
  3. Self-improver algorithm that solves potential weaknesses opponents have identified in the game’s strategy. Typically, AIs use ML to find mistakes in the opponent’s strategy and exploit them. But that also opens the AI to exploitation if the opponent shifts strategy. Instead, Libratus’ self-improver module analyses opponents’ bet sizes to detect potential holes in Libratus’ strategy. Libratus then adds these missing decision branches, computes probabilities and strategies for them, and adds them to the existing strategy.

This strategy is called the blueprint strategy.

Libratus is computationally expensive and was powered by the Bridges system, a high-performance computer that could achieve, at maximum, 1.35 Pflops. Libratus burned through approximately 19 million core hours of computing throughout the tournament In addition to beating the human experts, Libratus has also won against the previous AI champion Baby Tartanian8.

Another one, DeepStack, is an AI capable of playing Heads-up no-limit Texas Hold’em, which includes a similar algorithm, continual re-solving, but it has not been tested against top professional players.

Most of the same abstraction techniques apply for games with more than two players that are not zero-sum, but their equilibrium-finding problems are such that no polynomial-time algorithm is known. It is not even clear that finding a Nash equilibrium is the right goal in such games. Different equilibria can have different values to the players.

This AI could be used for calculating strategic decisions in the real world, such as in finance and information security.

Reinforcement learning and its new frontiers

leave a comment »

RL’s origins and historic context

RL copies a very simple principle from nature. The psychologist Edward Thorndike documented it more than 100 years ago. Thorndike placed cats inside boxes from which they could escape only by pressing a lever. After a considerable amount of pacing around and meowing, the animals would eventually step on the lever by chance. After they learned to associate this behaviour with the desired outcome, they eventually escaped with increasing speed.

Some of earliest AI researchers believed that this process might be usefully reproduced in machines. In 1951, Marvin Minsky, a student at Harvard who would become one of the founding fathers of AI, built a machine that used a simple form of reinforcement learning to mimic a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy Reinforcement Computer (SNARC), consisted of dozens of tubes, motors, and clutches that simulated the behaviour of 40 neurons and synapses. As a simulated rat made its way out of a virtual maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behaviour.

There were few successes over the next few decades. In 1992, Gerald Tesauro demonstrated a program that used the technique to play backgammon. It became skilled enough to rival the best human players, a landmark achievement in AI. But RL proved difficult to scale to more complex problems.

In March 2016, however, AlphaGo, a program trained using RL, won against one of the best Go players of all time, South Korea’s Lee Sedol. This milestone event opened again teh pandora’s box of research about RL. Turns out the key to having a strong RL is to combine it with deep learning.

Current usage and major methods of RL

Thanks to current RL research, computers can now automatically learn to play ATARI games, are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots learn how to perform complex manipulation tasks that defy explicit programming.

However, while RL saw its advancements accelerate, progress in RL has not been driven as much by new ideas or additional research as just by more of data, processing power and infrastructure. In general, there are four separate factors that hold back AI:

  1. Processing power (the obvious one: Moore’s Law, GPUs, ASICs),
  2. Data (in a specific form, not just somewhere on the internet – e.g. ImageNet),
  3. Algorithms (research and ideas, e.g. backprop, CNN, LSTM), and
  4. Infrastructure (Linux, TCP/IP, Git, AWS, TensorFlow,..).

Similarly for RL, for example for computer vision, the 2012 AlexNet (deeper and wider version of 1990’s Convolutional Neural Networks – CNNs). Or, ATARI’s Deep Q Learning is an implementation of a standard Q Learning algorithm with function approximation, where the function approximator is a CNN. AlphaGo uses Policy Gradients with Monte Carlo tree search (MCTS).

RL’s most optimal method vs. human learning

Generally, RL approaches can be divided into two core categories. The first focuses on finding the optimum mappings that perform well in the problem of interest. Genetic algorithmgenetic programming and simulated annealing have been commonly employed in this class of RL approaches. The second category is to estimate the utility function of taking an action for the given problem via statistical techniques or dynamic programming methods, such as TD(λ) and Q-learning. To date, RL has been successfully applied in many real-world complex applications, including autonomous helicopterhumanoid roboticsautonomous vehicles, etc.

Policy Gradients (PGs), one of RL’s most used methods, is shown to work better than Q Learning when tuned well. PG is preferred because there’s an explicit policy and a principled approach that directly optimises the expected reward.

Before trying PGs (canon), it is recommended to first try to use cross-entropy method (CEM) (normal gun), a simple stochastic hill-climbing “guess and check” approach inspired loosely by evolution. And if you really need to or insist on using PGs for your problem, use a variation called TRPO, which usually works better and more consistently than vanilla PG in practice. The main idea is to avoid parameter updates that change the policy dramatically, as enforced by a constraint on the KL divergence between the distributions predicted by old and the new policies on data.

PGs, however have few disadvantages: they typically converge to a local rather than a global optimum and they display inefficient and high variance while evaluating a policy. PGs also require lot of training samples, take lot of time to train, and are hard to debug debug when they don’t work.

PG is a fancy form of guess-and-check, where the “guess” refers to sampling rollouts from a current policy and encouraging actions that lead to good outcomes. This represents the state of the art in how we currently approach RL problems. But compare that to how a human might learn (e.g. a game of Pong). You show him/her the game and say something along the lines of “You’re in control of a paddle and you can move it up or down, and your goal is to bounce the ball past the other player”, and you’re set and ready to go. Notice some of the differences:

  • Humans communicate the task/goal in a language (e.g. English), but in a standard RL case, you assume an arbitrary reward function that you have to discover through environment interactions. It can be argued that if a human went into a game without knowing anything about the reward function, the human would have a lot of difficulty learning what to do but PGs would be indifferent, and likely work much better.
  • A human brings in a huge amount of prior knowledge, such as elementary physics (concepts of gravity, constant velocity,..), and intuitive psychology. He/she also understands the concept of being “in control” of a paddle, and that it responds to your UP/DOWN key commands. In contrast, algorithms start from scratch which is simultaneously impressive (because it works) and depressing (because we lack concrete ideas for how not to).
  • PGs are a brute force solution, where the correct actions are eventually discovered and internalised into a policy. Humans build a rich, abstract model and plan within it.
  • PGs have to actually experience a positive reward, and experience it very often in order to eventually shift the policy parameters towards repeating moves that give high rewards. On the other hand, humans can figure out what is likely to give rewards without ever actually experiencing the rewarding or unrewarding transition.

In games/situations with frequent reward signals that requires precise play, fast reflexes, and not much planning, PGs quite easily can beat humans. So once we understand the “trick” by which these algorithms work you can reason through their strengths and weaknesses.

PGs don’t easily scale to settings where huge amounts of exploration are difficult to obtain. Instead of requiring samples from a stochastic policy and encouraging the ones that get higher scores, deterministic policy gradients use a deterministic policy and get the gradient information directly from a second network (called a critic) that models the score function. This approach can in principle be much more efficient in settings with  high-dimensional actions where sampling actions provide poor coverage, but so far seems empirically slightly finicky to get working.

There is also a line of work that tries to make the search process less hopeless by adding additional supervision. In many practical cases, for instance, one can obtain expert trajectories from a human. For example AlphaGo first uses supervised learning to predict human moves from expert Go games and the resulting human mimicking policy is later fine-tuned with PGs on the “real” goal of winning the game.

RL’s new frontiers: MAS, PTL, evolution, memetics and eTL

There is another method called Parallel Transfer Learning (PTL), which aims to optimize RL in multi-agent systems (MAS). MAS are computer systems composed of many interacting and autonomous agents within an environment of interests for problem-solving. MAS have a wide array of applications in industrial and scientific fields, such as resource management and computer games.

In MAS, as agents interact with and learn from one another, the challenge is to identify suitable source tasks from multiple agents that will contain mutually useful information to transfer. In conventional MAS (cMAS), which are optimal for simple environments, actions of each agent are pre-defined for possible states in the environment. Normal RL methodologies have been used as the learning processes of (cMAS) agents through trial-and-error interactions in a dynamic environment.

In PTL, each agent will broadcast its knowledge to all other agents while deciding whose knowledge to accept based on the reward received from other agents vs. expected rewards it predicts. Nevertheless, agents in this approach tend to infer incorrect actions on unseen circumstances or complex environments.

However, for more complex or changing environments, it is necessary to endow the agents with intelligence capable of adapting to an environment’s dynamics. A complex environment, almost by definition, implies complex interactions and necessitated learning of MAS, which current RL methodologies are hard-pressed to meet. A more recent machine learning paradigm of Transfer Learning (TL) was introduced as an approach of leveraging valuable knowledge from related and well studied problem domains to enhance problem-solving abilities of MAS in complex environments. Since then, TL has been successfully used for enhancing RL tasks via methodologies such as instance transferaction-value transferfeature transfer and advice exchanging (AE).

Most RL systems aim to train a single agent or cMAS. Evolutionary Transfer Learning framework (eTL) aims to develop intelligent and social agents capable of adapting to the dynamic environment of MAS and more efficient problem solving. It’s inspired by Darwin’s theory of evolution (natural selection + random variation) by principles that govern the evolutionary knowledge transfer process. eTL constructs social selection mechanisms that are modelled after the principles of human evolution. It mimics natural learning and errors that are introduced due to the physiological limits of the agents’ ability to perceive differences, thus generating “growth” and “variation” of knowledge that agents have, thus exhibiting higher adaptability capabilities for complex problem solving. Essential backbone of eTL comprises of memetic automaton, which includes evolutionary mechanisms such as meme representation, meme expression, etc.

Memetics

 

The term “meme” can be traced back to Dawkins’ “The Selfish Gene”, where he defined it as “a unit of information residing in the brain and is the replicator in human cultural evolution.” For the past few decades, the meme-inspired science of Memetics has attracted increasing attention in fields including anthropology, biology, psychology, sociology and computer science. Particularly, one of the most direct and simplest applications in computer science for problem solving has become memetic algorithm. Further  research of meme-inspired computational models resulted in concept of memetic automaton, which integrates memes into units of domain information useful for problem-solving. Recently, memes have been defined as transformation matrixes that can be reused across different problem domains for enhanced evolutionary search. As with genes serving as “instructions for building proteins”, memes carry “behavioural instructions,” constructing models for problem solving.

 

Memetics in eTL

 

Meme representation and meme evolution form the two core aspects of eTL. It then undergoes meme expression and meme assimilation. Meme representation is related to what a meme is, while meme expression is defined for an agent to express its stored memes as behavioural actions, and meme assimilation captures new memes by translating corresponding behaviours into knowledge that blends into the agent’s mind-universe. The meme evolution processes (i.e. meme internal and meme external evolutions) comprise the main behavioural learning aspects of eTL. To be specific, meme internal evolution denotes the process for agents to update their mind-universe via self learning or personal grooming. In eTL, all agents undergo meme internal evolution by exploring the common environment simultaneously. During meme internal evolution, meme external evolution might happen to model the social interaction among agents mainly via imitation, which takes place when memes are transmitted. Meme external evolution happens whenever the current agent identifies a suitable teacher agent via a meme selection process. Once the teacher agent is selected, meme transmission occurs to instruct how the agent imitates others. During this process, meme variation facilitates knowledge transfer among agents. Upon receiving feedback from the environment after performing an action, the agent then proceeds to update its mind-universe accordingly.

 

eTL implementation with learning agents

 

There are two implementations of learning agents that take the form of neurally-inspired learning structures, namely a FALCON and a BP multilayer neural network. Specifically, FALCON is a natural extension of self-organizing neural models proposed for real-time RL, while BP is a classical multi-layer network that has been widely used in various learning systems.
  1. MASs with TL vs. MAS without TL: Most TL approaches outperform cMAS. This is due to TL endowing agents with capacities to benefit from the knowledge transferred from the better performing agents, thus accelerating the learning rate of the agents in solving the complex task more efficiently and effectively.
  2. eTL vs. PTL and other TL approaches: FALCON and BP agents with the eTL outperform PTL and other TL approaches due to the reason that, when deciding whether to accept  information broadcasted by the others, agents in PTL tend to make incorrect predictions on previously unseen circumstances. Further, eTL also demonstrates superiority in attaining higher success rates than all AE models thanks to meme selection operator of eTL, which considers a fusion of the “imitate-from-elitist” and “like-attracts-like” principles so as to give agents the option of choosing more reliable teacher agents over the AE model.

Conclusions

While popularisation of RL is traced back to Edward Thorndike and Marvin Minsky, it’s been inspired by nature and present with us humans since ages long gone. This is how we effectively teach children and want to now teach our computer systems, real (neural networks) or simulated (MAS).

RL reentered human consciousness and rekindled our interest again in 2016 when AlphaGo beat Go champion Lee Sedol. RL has, via its currently successful PGs, DQNs and other methodologies, already contributed and continues to accelerate, turn more intelligent and optimise humanoid robotics, autonomous vehicles, hedge funds, and other endeavours, industries and aspect of human life.

However, what is that optimises or accelerates RL itself? Its new frontiers represent PTLs, Memetics and a holistic eTL methodology inspired by natural evolution and spreading of memes. This latter evolutionary (and revolutionary!) approach is governed by several meme-inspired evolutionary operators (implemented using FALCON and BP multi-layer neural network), including meme evolutions.

The performance efficacy of eTL seems to have outperformed even most state-of-the-art MAS TL systems (PTL).

What future does RL hold? We don’t know. But the amount of research resources, experimentation and imaginative thinking will surely not disappoint us.

Bitcoin, ICOs, Mississippi Bubble and crypto future

with one comment

Bitcoin bubble

Bitcoin has risen 10x in value so far in 2017, the largest gain of all asset classes, prompting sceptics to declare it a classic speculative bubble that could burst, like the dotcom boom and the US sub-prime housing crash that triggered the global financial crisis. Stocks in the dotcom crash were worth $2.9tn before collapsing in 2000, whereas the market cap of bitcoin currently (as of 03.12.2017) stands at $185bn, which could signal there is more room for the bubble to grow.

 

Many a financiers and corporate stars think there is a bubble and a huge opportunity. One of the biggest bitcoin bulls on Wall Street, Mike Novogratz, thinks cryptocurrencies are in a massive bubble (but anticipates Bitcoin reaching $40,000 by end of 2018). Ironically (or not), he’s launching a $500 million fund, Galaxy Digital Assets Fund, to invest in them, signalling a growing acceptance of cryptocurrencies as legitimate investments.  John McAfee has doubled down on his confidence in bitcoin by stating his belief it will be worth $1 million by the end of 2020.

 

Former Fed Chairman Alan Greenspan has said that “you have to really stretch your imagination to infer what the intrinsic value of bitcoin is,” calling the cryptocurrency a “bubble.” Even financial heavyweights such as CME, the world’s leading derivatives marketplace, is planning to tap into this gold rush by introducing bitcoin derivatives, which will let hedge funds into the market before end of 2017.

 

The practical applications for cryptocurrencies to facilitate legal commerce appear hampered by relatively expensive transaction fees and the skyrocketing energy costs associated with mining at this juncture. On this note, Nobel Prize-winning economist Joseph Stiglitz thinks that bitcoin “ought to be outlawed” because it doesn’t serve any socially useful function and yet consumes enormous resources.

Bitcoin mania has many parallels with Mississippi Bubble

Bitcoin’s boom has gone further than famous market manias of the past like the tulip craze or the South Sea Bubble, and has lasted longer than the dancing epidemic that struck 16th-century France, or recent dot.com bubble in 2000. Like many others events such South Sea Bubble, ultimately, it was a scheme. No (real economy) trade would reasonably take place but the company’s stock kept rising on promotion and the hope of investors.

 

In my view, a more illustrative example, with many parallels for Bitcoin, is Mississippi Bubble, which started in 1716.  Not only was the Mississippi Bubble bigger than the South Sea Bubble, but it was more speculative and more successful. It completely wiped out the French government’s debt obligations at the expense of those who fell under the sway of John Law’s economic innovations.

 

Its origins track back to 1684 when Compagnie du Mississippi (Mississippi Company) was chartered. In August 1717, Scottish businessman/economist John Law acquired a controlling interest in the then-derelict Mississippi Company and renamed it the Compagnie d’Occident. The company’s initial goal was to trade and do business with the French colonies in North America, which included most of the Mississippi River drainage basin, and the French colony of Louisiana. Law was granted a 25-year monopoly by the French government on trade with the West Indies and North America. In 1719, the company acquired many French trading companies and combined these into the Compagnie Perpetuelle des Indes (CPdI). In 1720, it acquired the Banque Royale, which had been founded by John Law himself as the Banque Generale (forerunner of France’s first central bank) in 1716.

 

Law then created speculative interest in CPdI. Reports were skillfully spread as to gold and silver mines discovered in these lands.  Law exaggerated the wealth of Louisiana with an effective marketing scheme, which led to wild speculation on the shares of the company in 1719. Law had promised to Louis XV that he would extinguish the public debt. To keep his word he required that shares in CPdI should be paid for one-fourth in coin and three-fourths in billets d’Etat (public securities), which rapidly rose in value on account of fake demand which was created for them.  The speculation was further fed by the huge increase in the money supply (by printing more money to meet the growing demand) introduced by Law (as he was also Controller General of Finances, equivalent to Finance Minister, of France) in order to ‘stimulate’ the economy.

 

CPdI’s shares traded around 300 at the end of 1718, but rose rapidly in 1719, increasing to 1000 by July 1719 and broke 10,000 in November 1719, an increase of over 3,000% in less than one year. CPdI shares stayed at the 9000 level until May 1720 when they fell to around 5000. By the spring of 1720, more than 2 billion livres of banknotes had been issued, a near doubling of the money supply in less than a year. By then, Law’s system had exploded – the stock-market bubble burst, confidence in banknotes evaporated and the French currency collapsed. The company sought bankruptcy protection in 1721. It was reorganised and open for business in 1722. However, in late 1720, Law was forced into exile and died in 1729. At its height, the capitalisation of CPdI was greater than either the GDP of France or all French government debt.

Why did Law fail? He was over-ambitious and over-hasty (like this Bitcoin pioneer?). He believed that France suffered from a dearth of money and incumbent financial system (Bitcoin enthusiasts claim it will revolutionize economies and countries like India are ideal for it) and that an increase in its supply would boost economic activity (Bitcoin aims to implement a variant of Milton Friedman’s k-percent rule: proposal to fix the annual growth rate of the money supply to a fixed rate of growth). He believed that printing and distributing more money would lower interest rates, enrich traders, and offer more employment to people. His conceptual flaw was his belief that money and financial assets were freely interchangeable – and that he could set the price of stocks and bonds in terms of money.

Law’s aim was to replace gold and silver with a paper currency (just like how Bitcoiners want to democratise/replace fiat money and eliminate banks). This plan was forced upon the French public – Law decreed that all large financial transactions were to be conducted in banknotes. The holding of bullion was declared illegal – even jewelry was confiscated. He recommended setting up a national bank (Banque Generale in 1716), which could issue notes to buy up the government’s debt, and thus bring about a decline in the interest rate.

During both South Sea and Mississippi bubbles, speculation was rampant and all manner of initial stock offerings were being floated, including:

  • For settling the island of Blanco and Sal Tartagus
  • For the importation of Flanders Lace
  • For trading in hair
  • For breeding horses

Some of these made sense, but lot more were absurd.

Economic value and price fluctuations of Bitcoin

Bitcoin is similar to other currencies and commodities such as gold, oil, potatoes or even tulips in that its intrinsic value is difficult – if not impossible – to separate from its price.

A currency has three main functions: store of value; means of exchange; and unit of account. Bitcoin’s volatility, seen when it fell 20% within minutes on November 29th 2017 before rebounding, makes it both a nerve-racking store of value and a poor means of exchange. A currency is also a unit of account for debt. As an example, if you had financed your house with a Bitcoin mortgage, in 2017 your debt would have risen 10x. Your salary, paid in dollars, etc. would not have kept pace. Put another way, had Bitcoin been widely used, 2016 might have been massively deflationary.

But why has the price risen so fast? One justification for the existence of Bitcoin is that central banks, via quantitative easing (QE), are debasing fiat money and laying the path to hyperinflation. But this seems a very odd moment for that view to gain adherents. Inflation remains low and the Fed is pushing up interest rates and unwinding QE.

A more likely explanation is that as new and easier ways to trade in Bitcoin become available, more investors are willing to take the plunge. As the supply of Bitcoin is limited by design, that drives up the price.

There are governments standing behind currencies and reliable currency markets for exchange. And with commodities, investors have something to hold at the end of the transaction. Bitcoin is more speculative because it’s digital ephemera. That isn’t true for all investments. Stockholders are entitled to a share of a company’s assets, earnings and dividends, the value of which can be estimated independent of the stock’s price. The same can be said about a bond’s payments of principal and interest.

This distinction between price and value is what allowed many observers to warn that internet stocks were absurdly priced in the late 1990s, or that mortgage bonds weren’t as safe as investors assumed during the housing bubble. A similar warning about Bitcoin isn’t possible.

What about Initial Coin Offerings (ICOs)? An ICO (in almost all jurisdictions so far) is an unregulated means, bypassing traditional fund raising methods, of raising capital for a new venture. Afraid of missing out on the next big thing, people are willing to hand their money over no matter how thin the premise, very much like in case of South Sea or Mississippi Bubbles. They have close resemblance to penny stock trading, with pump-n-dump schemes, thin disclosures and hot money pouring in and out of stocks.

ICOs, while an alternative financing scheme for startups, aren’t so far sustainable for business. Despite the fact that more than 200 ICOs have raised more than $3 billion so far in 2017, only 1 in 10 tokens is use after the ICO. And a killer app for most popular public blockchain platform Ethereum, which sees increasing number of ICOs? First ecosystem (game to trade kittens) has been launched and almost crashed Ethereum network. This game alone consumes 15% of Ethereum traffic and even than it’s hard to play due to its slowness (thanks Markus for this info bite!).

So overall, Bitcoin (and other crypto currencies) exist only for the benefit of those that buy-n-hold and use them while creating an explicit economic program of counter-economics. In other words, Bitcoin is not as much about money but power.

How it all may end (or begin)

The South Sea Bubble ended when the English government enacted laws to stop the excessive offerings. Mississippi Bubble ended when French currency collapsed, French government bought back (and ultimately wrote off debt via QE) all CPdI’s shares and cast out instigators. The unregulated markets became regulated.

From legal perspective, most likely the same thing will happen to cryptocurrencies and ICOs. China temporarily banned cryptocurrency exchanges till regulations can be introduced. Singapore, Malaysia, and other governments have plans to introduce regulations by end of 2017 or early 2018. Disregard, ignorance, or flaunting of regulatory and other government-imposed rules be mortal for startups and big businesses alike.

From technology perspective, a number of factors, including hard forks, ledger and wallet hacking and its sheer limitations related to scaling, energy consumption, security might bring it down. Also many misconceptions about blockchain/Bitcoin such as claims of a blockchain being everlasting, indestructible, miners providing security, and anonymity being a universally good thing are either exaggerated, not always or patently not true at all.

From business perspective, startups and companies raising money via ICO can be subject to fraud – Goldman Sachs’ CEO claims Bitcoin is a suitable means for conducting fraud, and thus subject to money laundering, counter-terrorist and other relevant government legislation. From investors perspective, shorting seems to be the most sure-fire way of investing profitably in cryptocurrencies.

During the dot-com craze, Warren Buffett was asked why he didn’t invest in technology. He famously answered that he didn’t understand tech stocks. But what he meant was that no one understood them, and he was right. Why else would anyone buy the NASDAQ 100 Index when its P/E ratio was more than 500x – a laughably low earnings yield of 0.2% – which is where it traded at the height of the bubble in March 2000.

It’s a social or anthropological phenomenon that’s reminiscent of how different tribes and cultures view the concept of money, from whale’s teeth to abstract social debts. How many other markets have spawned conceptual art about the slaying of a “bearwhale

Still, the overall excitement around Bitcoin shows that it has tapped into a speculative urge, one that isn’t looking to be reassured by dividends, business plans, cash flows, or use cases. Highlighting a big, round number like $10,000 only speaks to our emotional reaction to big, round numbers. But it doesn’t explain away the risk of this one day falling to the biggest, roundest number of all – zero.