# Scott Aaronson’s mini-course at UFF – 4th lecture

These are notes for Scott’s 4th lecture, delivered on Thursday, Dec. 19th, 2013 at UFF.

Solving exact BosonSampling is classically hard unless PH collapses. Here’s the argument:

1- Estimating $|per(X)|^2$ is already $\#P$-hard.

2- Given a fast BosonSampling simulator, we’d be able to estimate $|per(X)|^2$ in BPP with an NP oracle (using approximate counting).

3- $BPP^{NP} \subseteq NP^{NP^{NP}}$ (Sipser, Gacs, Lautemann).

4- $PH \subseteq P^{\#P}$ (Toda’s theorem).

Conclusion: exact BosonSampling must be classically hard unless $PH=NP^{NP^{NP}}$, i.e. there’s a collapse in the polynomial hierarchy.

Now we give a second argument for hardness of simulation of exact BosonSampling, this time using the concept of post-selection. This came out of independent work by Bremner, Jozsa and Shepherd, who studied the IQP model, which describes a different type of restricted quantum computer, also believed to be not universal but hard to simulate. Using similar ideas, we can use linear optics to give an independent proof that the permanent is #P-complete.

We’ll also discuss what’d happen if we had a fast classical algorithm for approximate BosonSampling. We can argue that this is the physically relevant question, as any experiment will have imperfections. In particular, note that not even a realistic experiment itself can sample from the exact BosonSampling distribution, which motivates the study of the hardness of approximate BosonSampling.

Class IQP is hard to simulate exactly

Let us now review the Bremner-Jozsa-Shepherd proof that exactly simulating quantum circuits in the IQP class is hard.

IQP means instantaneous quantum polynomial time, even though this is probably not the best description for this class. Consider qubits which are initialized in the zero state $\left|000000\right\rangle$. Then each qubit undergoes a Hadamard gate, after which the whole circuit undergoes a conditional phase circuit f, followed by single-qubit Hadamards and measurements in the computational basis. (We can  call these circuits “Hadamard-phase sandwiches”, for obvious reasons.) We can encode #P-complete problems in circuits like that. We can ask: what’s the amplitute we’ll get all zeroes in the end? This is a sum of all $2^n$ values of the phase function f. $f:\{0,1\}^n \to \{1,-1\}$, and we assume f is an efficiently computable function. Estimating the sum over the exponentially-large number of arguments of f encodes a #P-complete problem.

A related result was obtained by Terhal and DiVincenzo in 2003, who studied the complexity of constant-depth quantum circuits. They proved that exact simulation would imply an unlikely computational complexity result. What they didn’t notice then (and neither did Scott at the time) was that their result also implied the collapse of the polynomial hierarchy.

Going back to BosonSampling… In the approximate simulation case, we’ll need to start using specific properties of the permanent. The permanent, as we’ll see, is robustly #-P complete. One way to justify this statement is the fact that even if we could compute the permanent for small classes of matrices, it would continue to be #P-complete (more on that later).

Scott and Alex came to the topic knowing that the permanent was #P-complete. They were looking for quantum systems that would involve permanents – bosons seemed to fit the purpose. Only later did they actually get excited about the fact that bosons actually exist! which means experiments could become a viable prospect.

What can we prove for the approximate case? It’s not as satisfactory as the exact case, but we can prove something. Before tackling the approximate case, however, let us give a second argument for the hardness of exact BosonSampling, an argument which is similar to the IQP proof by Bremner, Jozsa and Shepherd.

Exact BosonSampling is hard: proof using post-selection

As we’ve mentioned, this proof uses the concept of post-selection. Post-selection is done naturally in experiments, to exclude unsuccessful events, the effects of limited-detection efficiency detectors, etc.

Let us define complexity class PostBPP= probabilistic classical computation, but with the added priviledge of post-selecting only the outcomes when a certain bit ends up as 1. PostBPP is the class of languages L such that there’s a poly-time Turing machine M with input x, and for every x there must be a string r that sets the postselection output flag. If x is in the language, we require that the probability that M accepts, conditioned on the flag having been set, must be $\ge 2/3$, and when x is not in the language, the acceptance probability must be $\le1/3$.

What can we say about this class? $NP \subseteq PostBPP$ (as we can try out all possible solutions, and postselect on a successful verification). Scott talked about postselection as being possibly a problematic concept, appealing to the quantum suicide problem. Here’s a problem with would-be quantum suicidal people: what if there are no solutions? that’d be a problem, but there’s a way around that. In the class definition we must include a provision that allows, with small probability, the machine to not do anything  – as far as I understood, this is a technical fix for the problem that arises when no solutions exist.

One can prove that post-selection doesn’t allow one to do much more than NP: $NP \subseteq PostBPP \subseteq BPP^{NP}$. This inclusion can again be proven using approximate counting.

Now the natural question: what if you have postselected quantum computation (= complexity class PostBQP)?

Obviously, $PostBPP \subseteq PostBQP$. Using Feynman’s path integral argument, $PostBQP \subseteq P^{\#P}$. Yesterday we’ve seen that $BQP \subseteq PP$ ( PP is like BQP, but with no gap between accepting and rejecting). Exercise: prove that $P^{PP} = P^{\#P}$. This means that in a sense, PP contains the power of #P. Hint to prove this: binary search to count the number of solutions to a satisfiability problem. These considerations mean we know that PostBQP is between PP and PostBPP.

Result by Scott (2004): PostBQP=PP. This means post-selection gives you huge power (above NP). This gives a quantum characterization of a classical complexity class. But who cares? What’s this result good for?

One answer: there’s an important theorem by Beigel-Reingold-Spielman (1990), stating that PP is closed under intersection. That means that for any pair of languages in PP, their intersection is also in PP. The proof is based on low-degree rational function approximations. Interestingly, Scott’s theorem PP = PostBQP gives an alternative proof of the same theorem. If you have two algorithms that find solutions in PostBQP, it’s easy to postselect on both to prove the class is closed under intersection. This is an example of more intuitive proofs made possible by quantum mechanics; almost certainly this is an effect of both quantum behaviour + postselection, as neither property alone would give the same result (we’ll see PostBPP is different from postBQP, unless the PH collapses).

We’ve seen that PostBQP = PP , and that $PostBPP \subseteq BPP^{NP} \subseteq NP^{NP^{NP}}$.

Suppose we had a fast classical algorithm to simulate any quantum circuit. This would mean that we we’d be able to solve post-selected quantum circuits with postselection of this classical simulation:

fast classical simulation $\Rightarrow postBPP=PostBQP \Rightarrow PP \subseteq NP^{NP^{NP}} \Rightarrow PH \subseteq P^{\#P} =P^{PP} \subseteq NP^{NP^{NP}}$ (and the PH collapses)

Even if there’s a classical algorithm to solve all quantum decision problems, perhaps there’s no way to sample from the probability distribution given by all quantum experiments – this is an important open problem. Even the ability of simulating optics experiments would imply in the same unlikely complexity theory consequence.

We’ve seen that Knill-Laflamme-Milburn showed that linear optics (LO) + adaptativity = BQP, i.e. take LO and add the resource of postselected measurements; this allows for BQP. The proof used a CSIGN gate that succeeds with probability 1/4, but heralds success. Something stronger is true: if we can do postselection at will, postselection will actually upgrade LO to PostBQP. Even though LO is not equal to BQP (probably), postselection gives the same class in both cases: postBQP = PostBosonP =PP, leading to the collapse of the PH. This is the end of the second argument showing that PH would collapse if we could exactly sample from the BS distribution.

Is BQP in PH? This is a big open problem. Here’s something Scott and Alex could say about this. Suppose there were a $BPP^{\Sigma_k}$ ($\Sigma_k$ is the kth level of the PH) algorithm for exact boson sampling (we’re being generous with the classical simulator, giving it these oracles). Then $P^{\#P} \subseteq BPP^{\Sigma_{k+1}} \subseteq \Sigma_{k+3} \Rightarrow PH = \Sigma_{k+3}$ (i.e. collapse to the $k+3$rd level of the PH).

Linear optical proof of #P-completeness of permanents

IQP got their result without using the hardness of calculating permanents. So does the KLM theorem somehow “know” about hardness of the permanent? This motivated an alternative proof, based on KLM, of the #P-completeness of permanent calculation.

Let’s use the power of LO to show #P-completeness of permanents.

Given $f:\{0,1\}^n \to \{1, -1\}$, compute the sum S over all x of f(x). We can construct an IQP circuit such that the amplitute of 000000 at the end is $S/2^n$. Now, the KLM theorem says that it is possible in poly time to transform this circuit into a LO network U such that the amplitude for starting in the standard all-zeroes initial state and ending in the same state is $\left\langle I |\phi(U)| I \right\rangle = S/2^n$ times the probability that all CSIGN gates work. That is, the amplitude is a scaled permanent, and the scaling factor is determined by the post-selection probability of the CSIGN gates. This gives us a matrix whose permanent give us the sum S we want. This proof might not be too easy, but it is natural if one already knows quantum mechanics.

BosonSampling: Approximate case

We consider the exact BosonSampling output distribution $D=\{P_x\}_x$, and an approximation of it $D'=\{Q_x\}_x$. Their closeness is quantified by the total variation distance  $||D-D'||=1/2 \sum_x |p_x - q_x|$.

Sampling from a multiplicative approximation of the ideal distribution is hard. This is because of approximate counting: the multiplicative error only makes the approximation slightly worse, but the argument goes as before.

The best we could assume is that the non-ideal D’ be close in variation distance to D. How well can we distinguish D from D’ by a one-shot experiment? The answer is the total variation distance. Suppose we have T errors, each of which introduces error $\epsilon$ in D, then total circuit introduces $T \epsilon$ error.

Suppose there’s just a tiny error in one beam splitter. We’d be sampling from D’ which is close to D. This motivates the definition of approximate BosonSampling. We’re given mxn matrix A. There’s a distribution experimental outcomes such that $Pr_{D_A} (s)=per(A_s)/(s_1!...s_m!)$. Now we want to sample from some D’ such that $||D'- D_A|| \le \epsilon$ in time which is poly($n, 1/\epsilon$). If we already couldn’t get a good result for this error, then certainly we couldn’t get good results for realistic errors.

The exact hardness arguments break down for approximate BosonSampling. Let’s have our mxn matrix A. The previous argument was about hardness of calculating the probability associated with the permanent of a submatrix of A. If we now only have an approximate algorithm, we have to think adversarially. The simulation could never output a given event associated with the chosen permanent, and just output the others, and this would still result in very small variation distance. I could smuggle my chosen matrix in any given submatrix of A, but in that case I need to fill the other rows, with care so that it results in orthonormal rows. This might still allow an adversary to identify which elements were filled in artificially, so as to not output the permanent we were interested in.

Solution: choose A uniformly at random from the Haar measure. What will the submatrices look like? There’s this theorem from random matrix theory: let A mxn be Haar-random and consider topmost nxn matrix X (any other submatrix would do). Provided n is sufficiently large with respect to m ($m \gg n^2$ is enough), X has entries which are close to being i.i.d Gaussian. If we look at all elements of the original matrix we’d notice the unitarity condition, but if you look at a small enough submatrix, you’d not notice the effect of unitarity. This is true, but complicated and needs to be proven.

So what’s the use for such A chosen uniformly, with sub-matrices which are close to being Gaussian distributed?

Suppose we had an iid Gaussian nxn matrix X, and we’re interested in its permanent. Let’s define the problem Gaussian Permanent Estimation (GPE). We want to estimate per(X) to $1 \pm \epsilon$, for a fraction of X’s which is $\le 1-\delta$, using time which is poly(n, $1/\epsilon, 1/\delta$).

Let’s consider a variant of this problem, which we shall call (GPE’). I want to estmate $|per(X)|^2$ to $\pm \epsilon n!$.  Note that $n!$ is the natural scale, due to the number of terms in the permanent. I want to do this for a fraction $\ge 1-\delta$ of X’s  in time which is poly(n,$1/\epsilon,1/\delta$).

Our main theorem can be stated as follows: Suppose there’s a fast classical algorithm to solve the approximate BosonSampling problem in time which is poly in $n,m,1/\epsilon$. Then $GPE' \in BPP^{NP}$. This is a result which is not quite as good as a PH collapse. Let us state our conjecture *: GPE’ is #P-hard. If conjecture * is true and we have approximate BosonSampling $\subseteq P \Rightarrow$ PH collapses.

Special facts about the permanents play a role in the plausibility that conjecture * is true, in particular the following theorem about matrices over finite fields: Suppose X is uniformly random  with elements over a finite field $F_p$ with p>2. Then if we could compute the permanent per(X) for most X, then we could compute the permanent for all X (and thereby solve a #P-complete problem if p>2). We demand that p>2 because the permanent is not #P complete for p=2, as in that case sum is the same as subtraction and determinant and permanent are the same.

In other words, over $F_p$, the permanent has worst-case and average-case equivalence. Open problem: is there any NP-complete problem with this equivalence? This is important for cryptography, so that choosing keys at random is hard.

Why is that true about the permanent? The permanent function is a low degree polynomial (degree n in the entries of the matrix). For convenience, assume $p \gg n$. Let X be the matrix whose permanent we want to know. Let’s imagine we have a black-box that gives the right value of the permanent for most X’s. Pick random other matrix Y. Use the black-box to obtain per(X+Y), per(X+2Y), …. per(X+kY). Each of these matrices is uniformly random, because Y was. Consider per(X+kY) as a function of k: this is a polynomial of degree n. Now we have a bunch of values for this polynomial, way more than n of them, and we know most of them are correct – let’s do low-degree polynomial interpolation, which can be found in polynomial-time (by the way, polynomials of low degree are excellent error-correction codes). With the polynomial in hand, we can evaluate it at k=0 and find per(X).

So if we ask that the analogue of this fact should also hold for complex numbers, then our conjecture would be true.

So, assuming the conjecture, how can we prove the main theorem? The trick is to estimate the approximate permanent of a matrix in class $BPP^{NP}$. Choose mxn matrix. Smuggle Gaussian X in a random location, but it’ll be indistinguishable from the “background” matrix elements. Adversary has to output s with probability  $p_s$ close to the correct probability. This indistinguishability leads the adversary to be forced to give approximate good results for most submatrices, so in particular, probably for the submatrix we care about.

Is GPE’ the same difficulty as GPE itself? To show both have the same complexity, we need to assume another conjecture, for which we’ve found numerical evidence. This conjecture should be solved before conjecture * can be.

New conjecture: PACC (permanent anti-concentration conjecture): there should exist some polynomial p such that for all n, for all $\delta>0$, if you look at a matrix from the gaussian ensemble, prob(|per(X)|}<$\sqrt{n!}p(\delta/n)) \le \delta$. This can be proven true for the determinant, and the statistical behaviour of both look roughly the same.

The upshot is that we can prove a theorem assuming PACC: GPE and GPE’ satisfy the properties we’d like them to satisfy, and their complexity is poly-time equivalent.