Demi explains the policy and value neural networks.
Policy is neural network trained to make reasonable moves based upon supervised learning of 100,000 games.
Value network is built from tens of millions of games to be able to determine what winning positions are.
The value scoring of positions was previously believed to be impossible.
Alphago used policy to reduce the breadth of search
Alphago used value network to reduce the depth of the search for good moves.
Alphago also uses fast rollouts to play a few thousand games to determine statistics for which moves are good.
The Alphago program is improving by several levels every few months.