# Scott Aaronson’s mini-course at UFF, 5th (and final) lecture

Notes on Scott’s 5th and final lecture of the mini-course “Complexity theory and quantum optics”. This lecture was delivered on Dec. 20th, 2013, at UFF.

We start by addressing some loose ends from yesterday:

1- Why does polynomial interpolation not show the #P-hardness of GPE?

Approximating the permanent is hard. For most matrices, exactly calculating the permanent is hard. But for finite fields, there’s an equivalence between worst and average cases. Why doesn’t the idea used for that last proof apply to our case of complex-valued matrices? More concretely, let us suppose we can approximate the permanent of most matrices. Why doesn’t this help us approximate the permanent of any other matrix?

Think of a set $M$ of matrices whose permanents we can estimate. Now I have a target matrix, and I would like to use polynomial interpolation to get a good estimate of its permanent by extrapolating the polynomial, using our set $M$.  The problem with this approach is that polynomial interpolation is not stable with respect to small errors; this issue didn’t come up in the finite-field case because with finite fields it doesn’t make much sense to talk about small errors. In order to “stabilize” the interpolation, the error bars have to be exponentially reduced, which is not a reasonable demand.

2- Quantum speedus even if P=NP?

Why do we care whether there are problems in BQP not in the PH? Another way to put this: suppose P=NP. Would it still make sense to put effort into building quantum computers? Certainly not for factoring, but BosonSampling may still be hard for classical computers, for even if P=NP and the PH collapses, classes outside of the PH might still be hard.

Scalability of boson sampling

BosonSampling offers the prospect of experimental implementations. These would involve building a (most likely photonic) bosonic linear-interferometer device; preparing n indistinguishable photons, sending them through the interferometer, then measuring the photons at the chips output ports. Such an experiment wouldn’t, as far as we know now, have practical applications, but it would be an interesting proof-of-principle. Note that there’s another important motivation for an experiment like that:  the resources required are those which are also necessary for universal computation (KLM-style), the only missing element would be feed-forward of measurement results (i.e. adaptativity). So a BosonSampling computer is a stepping stone towards universal photonic quantum computers.

There are people who question whether scalable quantum computers are possible. These people postulate a noise model, or some other fundamental reason on top of quantum mechanics, that acts to prevent large-scale quantum computers. Even though the sceptics are a minority, addressing this scepticism is another reason for doing experiments that show that there are things in Nature which are hard to simulate with classical computers.

Another question that is commonly asked: don’t we already have other hard-to-simulate systems (high-temp superconductors and the like)? Doesn’t this already refute the Extended Church-Turing Thesis? Answer: for those other problems, we can’t know whether we just haven’t been smart enough to find an answer, or whether the simulation is asymptotically hard. I.e., do they just seem hard or is the hardness intrinsic? In complexity theory, the type of evidence for hardness that we normally look for is a reduction: if we solve this problem, we’d be able to solve that other one (which would, hopefully, be an example of a problem in a hard complexity class).

So these are the motivations for boson sampling, and recently we have been finding other examples of likely hard-to-simulate quantum phenomena. BosonSampling is a model case where we’ve thought through what the issues are.

To obtain experimental evidence against the Extended Church-Turing (ECT) thesis, we have to face some difficulties. The ECT is an asymptotic statement about what’s possible using polynomially-scaling resources. This kind of claim, by its very nature, can neither be confirmed nor refuted by any finite amount of experimental data. We won’t spend much time on this point here, as this kind of argument could probably be persuasive only to mathematicians; in science, we try to get overwhelming evidence, not mathematical certainty. So our aim is to make the ECT less tenable as a scientific hypothesis. So we’d like to do BosonSampling experiments with 10, 20, or 30 photons, and check these are working as advertised; this would suggest larger experiments would also behave as expected.

So what’s hard about scaling up BosonSampling experiments? The main difficulty seems to be to find a way to increase the probability of n-photon coincidences at the output. Let’s imagine we have n input photons and that we are in the “dilute regime” $m \sim n^2$.

Do we need number-resolving detectors? We don’t, as long as $m \sim n^2$, because collision events are unlikely in this case (the bosonic birthday paradox). So it suffices to discriminate 0 from more photons; this kind of measurement is called a “bucket” measurement.

If we are to implement the proposal of Haar-random interferometers, we need to pick a single unitary uniformly at random and hard-wire the network, which is possible with current chip-writing techniques.

What about the tuning of each beam-splitter’s parameters? Experimentally, we can do pretty much any beam splitter we want. A couple of months ago Bouland&Aaronson showed that any single non-trivial fixed beam splitter suffices to obtain approximations of any unitary, using only a quadratic number of them.

Now for the photon sources, which are the main experimental difficulty, and directly responsible for the limited size of experiments so far. We don’t yet have reliable deterministic single-photon sources. Instead, we have non-determinic single-photon sources: either we don’t control exactly when they fire, or they’ll output a superposition of 0,1,2,… photons (e.g. a coherent state).

We want to experimentally verify that the linear-optical permanent formula holds for as many photons as we can. The photons will be distinguishable if they arrive at different times. If the photons are completely distinguishable, the probabilities are again permanents, but of non-negative matrices, which can be approximated by the Jerrum-Sinclair-Vigoda algorithm. If we have partial distinguishability, we get more complicated formulas involving permanents of both general and non-negative matrices.

Just how many photons do we need? As many as possible! We can try to estimate the breakpoint when the experiment outperforms its classical simulation. The best we know about the hardness is that it’s at least as hard as computing a nxn permanent, which takes time $O(n2^n)$. Perhaps the complexity increases with the number of modes, but we don’t know about this yet.

How complex is the beam-splitter network: we need n photons, $n^2$ modes, and of order $n^4$ beam splitters. But simplifications are possible. We don’t care about the part of the network that the photons never visit, so this brings the total number of required beam splitters down to $O(n^3)$. Further simplification is possible, at the cost of requiring that arbitrary pairs of waveguides be able to “talk” to each other. This may bring experimental complications requiring 3D designs or other solutions. We could choose to implement a pseudo-random unitary with a smaller beam-splitter decomposition; this may be enough for our purposes. Cutting corners may result in simplifications that could compromise the hardness claim – this is similar to the interplay, in cryptography, between practical and theoretical work.

What’s the cross-over point, when the BosonSampling experiment will start being hard to simulate on a classical computer? Scott’s guess is that it’ll take about 30 photons.

Question: what would be good figures of merit to characterize pseudo-random ensembles? We can check a given unitary doesn’t belong to a class whose permanent can be computed efficiently; other than that, we can look for various traits which are typical of the Haar ensemble.

Question: what about photon polarization? It amounts to just another mode, so it may be helpful in practice. Another way to put this: photons with different polarizations going through the same beam splitter are actually in different modes.

Dealing with probabilistic single-photon sources

Let us look closer at what we can do with n probabilistic single photon sources, each of which fires with probability $c$ in a given interval of time. If we just wait for all of them to fire, the probability of this happening is $1/c^n$, so this is not practical. How can we deal with this problem?

1- Invent a deterministic single-photon source. There’s this idea of “optical multiplexing”, ie. wiring many non-deterministic photon sources to route photons into a single mode; Jeremy O’Brien seems optimistic about this.

2- Sample a hard distribution even without determistic single-photon sources. Look at sources that actually exist and try to use them to sample from a distribution for which we can make the same hardness argument.

2a: Coherent states (lasers), with Fock-state expansion $\sum_n \left| \alpha_n |n \right\rangle$. The number of photons follows a Poisson distribution. These states are useless for BosonSampling: if we act on a collection of coherent states via the interferometer unitary U, the result is a collection of coherent output modes. The output can be found by just multiplying U by the initial state; this is too easy to  simulate, everything behaves as classical light. Even postselecting with number resolving detectors doesn’t help. What could help is measurement with number-resolving detectors and feed-forward – postselection on having a single photon enables us to effectively create a single-photon source.

2b: Spontaneous Parametric Down-Conversion sources.

In between the cases of Fock and coherent states, we have Gaussian squeezed states. When we pump certain crystals with a laser, sometimes you produce two photons, in what is called a Spontaneous Parametric Down-Conversion (SPDC) process. Although non-deterministic, the detection of one of these photons heralds its twin.  This has been the source used in the first practical BosonSampling experiments. One issue we must be aware of: there are (small) higher-order terms corresponding to the production of more than two photons.

BosonSampling with Gaussian inputs

What can we say about computational hardness for boson sampling with Gaussian inputs? Homodyne measurements project onto Gaussian states. For the combination Gaussian inputs/homodyne detections, Bartlett and Sanders proved that there’s an efficient classical simulation, proving a continuous analogue of the Gottesman-Knill theorem. So if we have Gaussian inputs, we’d better have measurements onto non-Gaussian states (e.g. number-resolving detectors, or bucket detectors, i.e. detecting the difference between 0 or more photons). The other way round is not known, i.e. about the hardness of Fock inputs with homodyne measurements.

Hardness of exact BosonSampling goes through for Gaussian inputs and photo-detection at the end. Why? Our argument relies on the power of postselection. We use many SPDC sources (say, one coupled to each input mode), and measure the heralding photons, postselecting on seeing 1 photon in each herald mode.  Conditionally on that, we’ll obtain definite single-photon Fock states at the input of n modes, and can do standard boson sampling.

What’s the computational complexity of approximate BosonSampling for Gaussian inputs? In a visit to Ian Walmsley’s group in Oxford, he and Steve Kolthammer proposed the following idea to Scott. Later, Scott also found out about independent work with similar ideas. This leads to a new version of BosonSampling, which Scott took to calling scattershot BosonSampling.

Basic idea: use many very attenuated SPDC sources, so that there’s a small, but non-zero probability of producing two photons. Take $n^2$ sources, one for each input mode of the interferometer, and each with a probability $\sim 1/n$ of producing a photon pair during a given time interval. Each signal photon is heralded by its twin, so we know which SPDC’s did produce single photons in each time window. The photons produced pass through the interforometer U, to be measured at the exit ports.

The only difference between this procedure and usual BosonSampling is that now we can’t predict which input modes are used in each run. In the new approach, we’re sampling over inputs and outputs, instead of just outputs. The observation we need to make is that the mxm unitary U is Haar-random. If we take any nxn  submatrix, it’ll be Haar-random too, for the same reasons we discussed in usual BosonSampling.

Question: do we not have fluctuations in the number of inputs? Well, we’ll have a  non-negligible probability of having exactly n heralded input photons, so that’s sufficient. In any case, the other runs are also interesting and hard to simulate, even though they’ll have less (or more) than n photons.

Photon detection efficiencies are also a factor limiting size of present experiments. NIST is developing photo-detectors with up to 98% or 99% efficiencies, but this is still an issue.

Another problem: higher-order terms in the SPDC process, in which more than two photons are produced. As long as we know which sources did that, this is OK, as the result is just a boson sampling distribution with a few repeated rows (which is still presumably hard to simulate). Another answer: suppose sources are attenuated enough so that only a 1/n fraction of the $n^2$ sources produce photons. Besides guaranteeing no collisions at the output, one can show that this also prevents collisions in the input modes. In the Fock basis, if amplitude $\alpha_1$ is $1/\sqrt{n}$ then $\alpha_2$ is 1/n.

So for Scattershot BosonSampling we can say the same: if there was a poly time classical algorithm to approximate it then $GPE' \in BPP^{NP}$.

We don’t know yet whether an oracle for Scatteshot BosonSampling enables us to solve ordinary BosonSampling. The reduction in the other direction is easy: if we have an oracle for BosonSampling, it is easy to solve Scattershot BosonSampling.

Question: does the distribution of inputs need to be uniform? No, the argument goes for any distribution of inputs.

Verification

How do you convince anyone else that you have a functioning BosonSampling device? As BosonSampling is not in NP, there’s no easy verification, unlike factoring. The output is a collection of samples $S_1, S_2, ..., S_T \in \phi_{m,n}$. We’d like to claim that the $S_i$‘s were sampled from the correct BosonSampling distribution $D_A$.

In practice and for the foreseable future, we’ll still be able to calculate all probabilities (say, with up to n=10 photons). We can get the submatrices associated with all events, calculate their permanents, plot them in a histogram, and check whether the experimental outcomes agree more with this than with other distributions.

Even if we had large experiments (n=30, say), we could still calculate permanents with a classical computer. It’d take longer but would still be feasible. The situation would be different if we had 100 or 1000 photons. If we did so, how would we ever know the experiment is running as it should?

The paper by Gogolin, Eisert, et al. (2013) argues that a large BosonSampling experiment would be just impossible to verify even in principle. Arguments: 1- the problem is not in NP; 2- Let us suppose that we only allow a symmetric verifier. Given a collection $S_1, S_2,..., S_T$ of experimental outcomes, a symmetrical verifier is not allowed to look at this collection and use U to check whether results are reasonable. All the verifier is able to use is the list of event multiplicities, i.e. the list that says $n_0$ events never happened, $n_1$ events happened once, $n_2$ events happened twice, and so on. It can be shown that this verifier would require an exponential sample size (and exponential time) to verify we have a BosonSampling distribution rather than the uniform distribution, because this is how long it takes to have repeating events.

Scott’s response: why on Earth should we put this restriction on the verifier? Any quantum algorithm can be made impossible to verify with this restriction. For Shor’s algorithm, for example, it’d be like forbidding us to look at the output factors, and only have access to the number of times each digit appears, which makes the output impossible to verify.

The Gogolin et al. paper became a springboard to think more about verification of boson sampling. This resulted in the paper “BosonSampling is far from uniform“, by Scott and Alex Arkhipov. First question we looked at: if we drop the symmetric algorithm restriction, is BosonSampling close or not to being uniform? What is the total variation distance between the boson sampling distribution and the uniform distribution: $d= || D_A-D_U||$? The distribution being far from uniform is not sufficient, but is certainly necessary for it to be hard to simulate.

Intuitively, we can look at the BosonSamping distribution. Each probability is given by the absolute value squared of the sum over an exponential number of contributions which almost cancel out, leaving a small residue left over. There are fluctuations in the distribution. In the new paper, Scott and Alex proved that, with high probability over Haar-random unitaries, $||D_A-D_U || >0.313$. The way to prove this is to consider a meta-probability distribution, i.e. distributions of probability distributions, corresponding to picking Haar-random unitaries. Take a single possible outcome. The submatrix associated with it, if i.i.d. Gaussian, has well-characterized fluctuations. This distribution is close to a log-normal dstribution. If the distribution were unform, one can show we’d have a distribution with a spike. Scott & Alex proved that the i.i.d. Gaussian distribution has a pdf that’s non-increasing, and this gives a way to get bounds on the distance from the uniform distribution. The matrices not being exactly i.i.d. Gaussian is no problem, as their variation distance to Gaussian matrices is small and we can use the triangle inequality.

So we know the BosonSampling distribution is not close to uniform. How do you distinguish it from the uniform distribution in poly time? A pseudorandom number generator generates lower entropy than real random numbers, but no poly time algorithm should tell the difference (if pseudo-random numbers are to be useful in cryptographic applications). This is why we could fear that the BosonSampling distribution being far from uniform may still leave open the possibility that no polynomial-time test could be able to prove it. The obvious way to tell the difference would be to calculate permanents, but this is not efficient.

Theorem: This can be done efficiently. Once we specify what the null hypothesis is, we can often design a poly-time algorithm specific for distinguishing BosonSampling data from that null hypothesis. For our problem, we should build an estimator that, on average, has a sufficiently different value for the two distributions, while being efficiently computable. Given the nxn matrix X associated with an output, we need an efficient estimator that is correlated with the permanent, but efficiently computable. One such quantity is $R = \prod_{i=1}^{n} \sum_{j=1}^n |x_{ij}|^2$, i.e. $R$ is the product of norms of rows of X. Why might this fit the purpose? If we multiply a row by a constant, the permanent gets to be multiplied by the same constant. If we rescale each row by a different constant $\lambda_i$, then the new $X'$ has a permanent which is $per(X) \prod_i \lambda_i$.

Let’s reescale R to have expectation 1. Each term in the product is a sum of n Gaussian terms. Each sum has fluctuations of order $\pm 1/\sqrt{n}$. Multiplying n of them results in constant-size fluctuations, which is good for us. If we’re sampling from the correct BosonSampling distribution, this will be biased towards large-R events, and this makes it possible to distinguish the distributions.

This doesn’t answer the question about whether R distinguishes BosonSampling from any other distribution. Observe that R has the same behaviour for fermion sampling, as the determinant is affected by rescaling in the same way the permanent is; the same is true for the behaviour of classical distinguishable particles. We know how to design efficient tests for other distributions, we’ve proven they work numerically but a rigorous proof is still in the works.