What the Quantum Fourier Transform is (and isn’t)

In the last post, I discussed the Fourier Transform on finite groups G. Here I’ll discuss in more detail the case G=Z_N, the finite cyclic groups. This is what is usually called the Discrete Fourier Transform.

The (Quantum) Fourier Transform is a unitary transformation on \mathcal{H}=\ell^2(Z_N)=C^N, which in the standard basis is expressed by the matrix

Screen Shot 2017-03-06 at 8.20.50 PM

with \omega=e^{2\pi i/N}. It turns out that if N=2^n and we interpret the Hilbert space \mathcal{H}=C^N as the state space of n qubits, U_\text{QFT} can be implemented efficiently, using only about n^2 elementary (1- and 2-qubit) gates.

Efficient Circuit for the QFT

Consider an n-qubit state
\begin{aligned} \left\vert\psi\right\rangle=\sum_{s=0}^{2^n-1}\alpha(s)\left\vert s\right\rangle. \end{aligned}
The n-qubit QFT acts on this state as follows:

Screen Shot 2017-03-06 at 8.24.10 PM
where the P and cP gates are phase and controlled phase gates defined implicitly, and H is the Hadamard gate. Therefore, we have

\begin{aligned} U_\text{QFT}^{(n)}=\left(I_{n-1}\otimes H\right)cP^{(n-1)}\left(U_\text{QFT}^{n-1}\otimes I\right). \end{aligned}
The controlled phase gate can be implemented with n-1 elementary (2-qubit) controlled gates, and the Hadamard gate H is a single qubit gate, so if we can implement the (n-1)-qubit QFT, we just need n additional elementary gates to implement the n-qubit QFT. The 1-qubit QFT is just a Hadamard gate, so we can implement the n-qubit QFT with n+(n-1)+(n-2)+\ldots +1=n(n+1)/2 elementary 1- and 2-qubit gates. I won’t reproduce the circuit here – nice drawings are available anywhere qubits are sold.

That’s the good news. The bad news is that the QFT is not an algorithm for the Discrete Fourier Transform in the sense we would really like it to be, i.e. a black box that takes a vector of classical data and outputs, as classical (i.e., readable) data, the Fourier transform of that vector. Instead, the QFT is a unitary transformation that takes as input a quantum state and gives as output another quantum state. So the QFT is not an exponentially faster replacement for the classical FFT algorithm, which does precisely this in time N\log N=n 2^n. In order to make use of the QFT as a limited replacement for the FFT, we need to figure out (1) what classical data can be efficiently encoded in the proper form and (2) what data can be efficiently extracted from the output state.

Efficiently Encodable Data

One example of data that can be efficiently encoded into a quantum state, given that it is already stored in a quantum accessible manner, is the following. Suppose that we have an oracle \mathcal{D} that was capable of performing the transformation \mathcal{D}:\left\vert s\right\rangle\mapsto\left\vert s\right\rangle\left\vert D(s)\right\rangle where D is a \left\lbrace 0,1\right\rbrace-valued function. Such an oracle can be efficiently implemented with the QRAM protocol (see e.g. https://arxiv.org/pdf/0708.1879.pdf). Then applying \mathcal{D} to the even superposition over all computational basis states (efficiently preparable from the zero state of n qubits via n Hadamard gates), measuring the second register in the computational basis, and post-selecting for outcome 1, we obtain the state

\begin{aligned} \mathcal{N}\sum_{s:D(s)=1}\left\vert s\right\rangle \end{aligned}

where \mathcal{N} is the normalization factor. This is a state of the form

\begin{aligned} \left\vert\psi\right\rangle=\sum_{s=0}^{M-1}\alpha(s)\left\vert s\right\rangle \end{aligned}
with \alpha(s)\propto D(s), so now the QFT can be used to Fourier transform D.

If we assume that a constant fraction of the values of s have D(s)=1, then we can prepare the state \left\vert\psi\right\rangle with exponentially high probability in the number of tries.

Efficiently Extractable Data

Here I’ll show that the QFT can be used in an efficient procedure for estimating the period of a quantum state. Since this application is at the heart of Shor’s algorithm, it’s probably the most famous use of the QFT.
Suppose that the vector \left\vert\psi\right\rangle=\sum_{s=0}^{N-1}\alpha(s)\left\vert s\right\rangle is periodic with period p, i.e. \alpha(s)=\alpha(s+p) for all s. Suppose also that p divides M. We have that T^p\left\vert\psi\right\rangle=\left\vert\psi\right\rangle where T\left\vert s\right\rangle=\left\vert s+1\right\rangle is the shift operator. Using the unitarity of the QFT, we then have
\begin{aligned} \left(U_\text{QFT}TU_\text{QFT}^\dagger\right)^pU_\text{QFT}\left\vert\psi\right\rangle=U_\text{QFT}\left\vert\psi\right\rangle. \end{aligned}
Expanding U_\text{QFT}\left\vert\psi\right\rangle in the Fourier basis, this gives
\begin{aligned} \sum_{k=0}^{M-1}\hat{\alpha}(k)\hat{T}^p\left\vert k\right\rangle=\sum_{k=0}^{M-1}\hat{\alpha}(k)\left\vert k\right\rangle \end{aligned}
where the hat denotes change of basis into the Fourier basis. It is simple to verify that \hat{T}\left\vert k\right\rangle=e^{2\pi ik/M}, so that in order for this equation to hold, \hat{\alpha}(k) must be zero unless kp is a multiple of M or, equivalently, k is a multiple of M/p (which we assumed was an integer). There are p such values of k.

This analysis shows that if we apply the QFT to \left\vert\psi\right\rangle and measure in the computational basis, we will obtain a value of k such that k=nM/p. Suppose we prepare J copies of \left\vert\psi\right\rangle and apply the QFT to each one, then measure in the computational basis. We obtain values k_1,\ldots,k_J, with k_i=n_iM/p. If the gcd of the n_i is one, then the gcd of the k_i is M/p, and since M is known, we obtain p. Of course, it’s possible that the \hat{\alpha}(k) are exponentially small except for some k a multiple of M/p, in which case we’ll underestimate p with high probability unless J is exponentially large, in which case the quantum speedup is lost. However, in this case, \left\vert\psi\right\rangle is very close to a state with the period we estimated, so it’s not unreasonable to be satisfied with this performance.

On the other hand, suppose that all of the p non-zero \hat{\alpha}(k) have roughly the same magnitude. Then the probability that with J samples we correctly find p is the same as the probability that the gcd of J integers drawn uniformly at random from 0,1,2,\ldots,p-1 is one. It is not too hard to see that this probability is bounded exponentially (in J) close to one. If we didn’t do the classical post-processing of the outcomes and just looked for the smallest value of k that appeared, we would not get this performance.

Failed Example

Combining these three parts (state preparation/data entry, transformation/data processing, and measurement/data extraction), we can try to build a quantum algorithm that gives with high probability a speedup over a classical computer for a simple problem. Suppose that we’re given a string of 2^n bits, stored either in a classical RAM or in a QRAM, and would like to determine the period of the data. Assume that a constant fraction of the bits are 1. Then with a constant expected number of calls to the QRAM, we can prepare the n-qubit state whose amplitudes encode the unknown bitstring. With \mathcal{O}(n^2) elementary gates, we perform the QFT on this state. Then we make a single n-qubit measurement to obtain a value of k present with non-zero amplitude in it’s Fourier transform. We need to repeat this procedure several times in order to give an estimate of the period. But how many?

Suppose that a malicious user designs a family of classical inputs A_n and B_n of lengths 2^{2n}, where the A_n have period 2^n and the B_n are identical to the A_n except that the first 1 in the second half of the string is changed to a 0, and the first 0 to a 1, so that they have period 2^{2n}. Then the corresponding quantum states \left\vert A_n\right\rangle and \left\vert B_n\right\rangle are exponentially close in n, meaning that the error probability for distinguishing \left\vert A_n\right\rangle and \left\vert B_n\right\rangle with a single measurement is exponentially close to 1/2. Therefore, it is exponentially unlikely that, given state \left\vert B_n\right\rangle, we measure an odd value of k after performing the QFT, as if it were more likely, the discrimination procedure that reports A if an even k is measured and B if an odd k is measured would have a lower error probability than is possible given the distance between the states. This means that given the inputs B_n, we need to repeat the basic QFT procedure above exponentially many times in order to have a large probability of obtaining an odd value of k and accurately determining the period.

This counterexample tells us that, even for this very contrived problem, the QFT does not give a speedup over the classical algorithm, at least for the worst case. However, it’s worth pointing out that for some applications, it might be a feature that the quantum algorithm with sub-exponentially many runs of the QFT is insensitive to small deviations from perfect periodicity.

Scott Aaronson has written a paper about these kinds of issues in the context of quantum machine learning.

Advertisements

Fourier Transform on Finite Groups (Quantum Fourier Transform)

Often it’s useful to study functions on spaces by examining how they transform under various transformations of the spaces. This general idea gives rise to many interesting objects. For instance, if the space is the real line, we can ask how functions transform when the real line is translated. This gives rise to the Fourier transform. For functions on a sphere, we can ask about transformation under rotations. This is the origin of the spherical harmonics. On a finite cyclic group, we can ask how functions transform when the group is left- or right- multiplied by a group element. Thinking of the group elements as marking positions in space, we can think of this group action as a shift, and think about the resulting discrete Fourier transform as a discrete analogue of the standard one.

In this post, I’ll discuss briefly a generalization of the discrete Fourier transform to arbitrary finite groups. The unifying thread in all the examples above is that the functions can be organized into irreducible representations of the group of transformations being applied to the space on which they’re defined. Those transformations here will be left- and right-multiplication by group elements.

1. Definition of the QFT

Let G be a finite group of size \left\vert G\right\vert. Since G is finite, we don’t have to be particularly careful about what we mean by “functions on G“, but because it matches up with the infinite case and because we’ll want an inner-product structure, we’ll specify the space \ell^2(G) of square-summable complex-valued functions. This space has a basis \left\lbrace\delta_g\right\rbrace_{g\in G} of functions defined by \delta_g(g)=1, \delta_g(h)=0 for all h\in G, h\neq g. Following QI convention, I’ll denote thisĀ \left\vert g\right\rangle.

Let \Lambda be a maximal set of unitarily-inequivalent unitary representations of G. We know that

\begin{aligned} \sum_{\lambda\in \Lambda}d_\lambda^2=\left\vert G\right\vert \end{aligned}

where d_\lambda is the dimension of the representation \lambda\in\Lambda. We can therefore choose another orthonormal basis for \ell^2(G) with basis functions labeled by elements \left\vert\lambda;ij\right\rangle for \lambda\in \Lambda and i,j=1,2,\ldots, d_\lambda. Now we can define the Quantum Fourier Transform (QFT) and its inverse as transformations from the basis \left\vert g\right\rangle to the basis \left\vert\lambda;ij\right\rangle and back:

\begin{aligned} U_{QFT}&=\sum_{\lambda\in\Lambda}\sum_{i,j=1}^{d_\lambda}\sum_{g\in G}\sqrt{\frac{d_\lambda}{\left\vert G\right\vert}}\left[R_\lambda(g)\right]_{ij}\left\vert\lambda;ij\right\rangle\left\langle g\right\vert\\ \\ U_{QFT}^\dagger&=\sum_{\lambda\in\Lambda}\sum_{i,j=1}^{d_\lambda}\sum_{g\in G}\sqrt{\frac{d_\lambda}{\left\vert G\right\vert}}\left[R_\lambda(g)\right]_{ij}^*\left\vert g\right\rangle\left\langle \lambda;ij\right\vert \end{aligned}

There’s nothing particularly quantum about this, other than the use of Dirac notation. However, an efficient implementation of the QFT is at the heart of Shor’s factorization algorithm. Somewhat tragically, the algorithm only uses the QFT for cyclic groups, so its full generality isn’t featured in that setting.

The QFT can be thought of as a map from complex functions on G to complex matrix-valued functions on the inequivalent unitary irreducible representations of G, where the value of the function at \lambda is a d_\lambda\times d_\lambda matrix. A convenient way to visualize the Quantum Fourier Transform is via the following table. The row next to a group element g is the QFT of \left\vert g\right\rangle, and the complex conjugate of the column under the (i,j) matrix element of the representation \lambda is the inverse QFT of \left\vert \lambda;ij\right\rangle. In other words, the table is (transpose of) the matrix of the unitary transformation U_\text{QFT}. The fact that the columns form an orthonormal set is the Schur orthogonality relation. This can be used to prove that U_\text{QFT} is in fact unitary, so the rows do as well.

qft_table

2. Transformation of Functions Under Group Action

The Fourier basis \left\vert \lambda;ij\right\rangle defined above turns out to be a useful basis for seeing how functions transform under group actions. In the group element basis, we have

U_\text{QFT}^\dagger\left\vert \tau;nm\right\rangle=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(g)\right]_{nm}^*\left\vert g\right\rangle

For h\in G, define the operators L_h and R_h by L_h\left\vert g\right\rangle=\left\vert hg\right\rangle and R_h\left\vert g\right\rangle=\left\vert gh^{-1}\right\rangle. As representations of G on \ell^2(G), these are know as the right and left regular representations, respectively. (I made the mistake of using R for both the right regular representation of G on \ell^2(G) and the matrices of the irreducible representations of G, but it should be clear from context what is meant.) Then:

\begin{aligned} L_hU_\text{QFT}^\dagger\left\vert\tau;nm\right\rangle&=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(g)\right]_{nm}^*\left\vert hg\right\rangle=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(h^{-1}g)\right]_{nm}^*\left\vert g\right\rangle=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(h^{-1})R_\tau(g)\right]_{nm}^*\left\vert g\right\rangle\\ &=\sum_k\left[R_\tau(h^{-1})\right]_{nk}^*\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(g)\right]_{km}^*\left\vert g\right\rangle=\sum_k\left[R_\tau(h^{-1})\right]_{nk}^*U_\text{QFT}^\dagger\left\vert\tau;km\right\rangle\\ &=\sum_k\left[R_\tau(h)\right]_{kn}U_\text{QFT}^\dagger\left\vert\tau;km\right\rangle\\ \\ R_hU_\text{QFT}^\dagger\left\vert\tau;nm\right\rangle&=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(g)\right]_{nm}^*\left\vert gh^{-1}\right\rangle=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(gh)\right]_{nm}^*\left\vert g\right\rangle=\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(g)R_\tau(h)\right]_{nm}^*\left\vert g\right\rangle\\ &=\sum_k\left[R_\tau(h)\right]_{km}^*\sum_g\sqrt{\frac{d_\tau}{\left\vert G\right\vert}}\left[R_\tau(g)\right]_{nk}^*\left\vert g\right\rangle=\sum_k\left[R_\tau(h^{-1})\right]_{mk}U_\text{QFT}^\dagger\left\vert\tau;nk\right\rangle \end{aligned}

where I’ve used the fact that these are unitary representations. Sticking a U_\text{QFT} in front of these two equations, we find

\begin{aligned} U_\text{QFT}L_hU_\text{QFT}^\dagger\left\vert\tau;nm\right\rangle&=\sum_{k=1}^{d_\tau}\left[R_\tau(h)\right]_{kn}\left\vert\tau;km\right\rangle\\ \\ U_\text{QFT}R_hU_\text{QFT}^\dagger\left\vert\tau;nm\right\rangle&=\sum_{k=1}^{d_\tau}\left[R_\tau(h^{-1})\right]_{mk}\left\vert\tau;nk\right\rangle \end{aligned}

so that

\boxed{\begin{aligned} \left\langle\lambda;ij\right\vert U_\text{QFT}L_hU_\text{QFT}^\dagger\left\vert\tau;nm\right\rangle&=\left[R_\tau(h)\right]_{in}\delta_{\lambda\tau}\delta_{jm}\\ \\ \left\langle\lambda;ij\right\vert U_\text{QFT}R_hU_\text{QFT}^\dagger\left\vert\tau;nm\right\rangle&=\left[R_\tau(h^{-1})\right]_{mj}\delta_{\lambda\tau}\delta_{in}. \end{aligned}}

What we’ve done is to use the QFT to block diagonalize the left and right regular representations of the group G on \ell^2(G). Notice that, unless G is Abelian and thus d_\lambda =1 for all \lambda, it isn’t possible to fully simultaneously diagonalize the operators L_g and R_g. Instead, we find a set of \left\vert G\right\vert=\sum_{\lambda}d_\lambda^2 basis functions on G grouped into sets of d_\lambda that transform under the representation \lambda, with multiplicity d_\lambda. We can think of these as being organized into a block diagonal matrix with blocks of size d_\lambda, one for each irreducible representation \lambda, with the left regular representation mixing functions in the same column and the right regular representation mixing functions in the same row. This is exactly how the d_\lambda\times d_\lambda matrices transform under left and right multiplications by g or g^{-1}.

This analysis sheds some light on what the group Fourier transform/QFT is – a way to analyze functions on G based on how they transform under the left and right actions of G on itself – but it doesn’t necessarily explain why we should care. One answer is simply that it provides a way of seeing these functions from a different perspective. I don’t know of any “real world” examples of what looking at the Fourier transform of a function on a generic finite group tells us. For instance, Fourier transforms of time-series yield frequency-space functions, which can be used e.g. to identify periodically recurring phenomena. It would be nice to have a similarly data-processing-motivated intuition for the Fourier transform on arbitrary groups. Part of the problem is that there aren’t many cases (that I can think of) of data taking the form of a function on any interesting group. Some googling suggests that one application is to functions on symmetric groups, where the Fourier transform can be used to study mixing times for randomly shuffled decks of cards.

Glauber dynamics for the Ising chain

The goal of this post is to describe a model of a heat bath for the d=1 Ising model. In equilibrium statistical physics, a heat bath is nothing more than a parameter \beta=1/T which tells us which of the Boltzmann distributions P_\beta(s)\propto e^{-\beta H(s)} to study for a particular system with Hamiltonian H. This neither explains why such a state is reasonable to study, nor provides any information about how such a state is reached from a non-equilibrium state (vague invocations of entropy and ergodicity notwithstanding). I’ll introduce the Glauber dynamics, which reproduce the Boltzmann distributions as unique steady states of a Markov chain (with a caveat at T=0). A conceptually interesting consequence of this framework is that it forces the realization that a heat bath is not a heat bath is not a heat bath. A black box that, given an Ising chain, implements the Glauber dynamics will cause equilibration of the Ising chain, but may not do the same to some other physical system. A model of a bath must be tailored to a specific physical system.

The Glauber dynamics are related to the Metropolis algorithms used for sampling from probability distributions. This is an interesting connection that I won’t explore here.

1. Ising chain and Glauber dynamics

Consider a classical spin chain with N sites and periodic boundary conditions. In other words, consider N variables s_i, i=1,2,3,\ldots taking values s_i=\pm 1 with s_{N+i}=s_i. Define the Ising Hamiltonian

\begin{aligned} H=-J\sum_{i=1}^{N}s_is_{i+1}. \end{aligned}

At each time step, pick a site j uniformly at random from 1 to N. Let \mathbf{s} be the current state of the system and let \mathbf{s}_j' be the state of the system resulting from \mathbf{s} via flipping the spin at site j, i.e. from the replacement s_j\rightarrow -s_j. Flip the spin at site j with probability given by

\begin{aligned} w(\mathbf{s}\rightarrow\mathbf{s}_j')=\frac{\alpha}{2}\left[1-\frac{\gamma}{2}s_j(s_{j-1}+s_{j+1})\right] \end{aligned}

for some value of \gamma yet to be specified. Note that this is indeed a Markov model. Incorporating the initial choice of j, the overall probability for the transition \mathbf{s}\rightarrow\mathbf{s}_j' is just w(\mathbf{s}\rightarrow\mathbf{s}_j')/N.

2. The Boltzmann distribution is a steady state of Glauber dynamics
Detailed balance with respect to the Boltzmann distribution P_\beta at inverse temperature \beta is the condition

\begin{aligned} \frac{w(\mathbf{s}\rightarrow\mathbf{s}_j')}{w(\mathbf{s}_j'\rightarrow\mathbf{s})}=\frac{P_\beta(\mathbf{s}'_j)}{P_\beta(\mathbf{s})}=\frac{e^{-\beta H(\mathbf{s}_j')}}{e^{-\beta H(\mathbf{s})}}. \end{aligned}

If the parameters w satisfy this condition, then P_\beta is an eigenvector of the dynamics, and we’ll have shown that the Glauber dynamics at least preserve equilibrium (later we’ll see that, for T>0, they also drive a system to equilibrium). The right-hand side can be expanded:

\begin{aligned} e^{-\beta(H(\mathbf{s}_j')-H(\mathbf{s}))}&=\exp\left[\beta J\left(\sum_{i=1}^N(-1)^{\delta_{i,j}\delta_{i+1,j}}s_is_{i+1}-\sum_{i=1}^Ns_is_{i+1}\right)\right]=\exp\left[-2\beta J\left(s_{j-1}s_j+s_js_{j+1}\right)\right]\\ &=e^{-2\beta Js_j(s_{j-1}+s_{j+1})}=\frac{1-\tanh \left[\beta Js_j(s_{j-1}+s_{j+1})\right]}{1-\tanh \left[\beta Js_j(s_{j-1}+s_{j+1})\right]} \end{aligned}

where the last equality is just e^{-2x}=(1-\tanh x)/(1+\tanh x). Recognizing that s_j(s_{j-1}+s_{j+1})/2 takes values 0,\pm 1 and that \tanh 0=0 and \tanh (-x)=-\tanh x, we get finally

\begin{aligned} \frac{w(\mathbf{s}\rightarrow\mathbf{s}_j')}{w(\mathbf{s}_j'\rightarrow\mathbf{s})}=\frac{1-\frac{1}{2}s_j(s_{j-1}+s_{j+1})\tanh 2\beta J}{1-\frac{1}{2}s_j(s_{j-1}+s_{j+1})\tanh2\beta J} \end{aligned}

as the detailed balance condition for P_\beta. Referring back to the proposed values of w(\mathbf{s}\rightarrow\mathbf{s}_j') above, we see that this condition is satisfied for

\gamma = \tanh2\beta J.

For very high temperatures, \beta\rightarrow 0 and \gamma\rightarrow 0, so that the dynamics do not distinguish between spin flips that raise the energy of the system and those that lower it. For low temperatures, \beta\rightarrow\infty and \gamma\rightarrow 1, so that the dynamics is much more likely to cause a spin flip that reduces the energy of the system than one that raises it. This formalizes the intuition that at low temperatures, energy considerations win, while at high temperatures, entropy dominates.

3. The Boltzmann distribution is the unique steady state for T>0

For T>0 (finite \beta), the Glauber dynamics on the state space of the Ising model forms an irreducible Markov chain with a finite state space (see e.g. Reference 3 below for a detailed overview of Markov chain machinery). Such Markov chains are positive recurrent (they visit each site infinitely many times, with finite mean time between visits) and therefore have unique stationary distributions, which we’ve already shown are the Boltzmann distributions. Moreover, since they are aperiodic, any distribution will converge to this stationary one under repeated application of the transition matrix. Physically, this is the formalization of the notion that any state will thermalize.

For T=0, \gamma=1, and the Markov chain is \textit{not} irreducible. We are therefore not guaranteed that there is a unique stationary state, and in fact there are continuously many: any probabilistic combination of the two ground states of the Ising model is a stationary solution of the dynamics, since at T=0 there is no possibility of a transition out of a ground state.

4. Magnetization dynamics

Up to this point, we’ve really just used the Glauber dynamics as a framework to hold results from equilibrium statistical mechanics (the study of the Boltzmann distributions). Now we’ll see that we can use them to model something new – the fundamentally non-equilibrium dynamics of relaxation to equilibrium.

At time-step n, the probability that spin j is flipped is

\frac{1}{N}\cdot\frac{\alpha}{2}\left[1-\frac{\gamma}{2}s_j(s_{j-1}+s_{j+1})\right]

so that \textit{given a fixed configuration \mathbf{s} at time n}, the expected change in spin j is

-\frac{\alpha}{N}\left[s_j-\frac{\gamma}{2}(s_{j-1}+s_{j+1})\right],

where we’ve used the fact that s_j^2=1. Letting \left\langle\cdot\right\rangle denote the expectation over all configurations and E\left[\cdot\right] the expectation over the dynamics of a single time step, we have

\left\langle E\left[\Delta s_j\right]\right\rangle=-\frac{\alpha}{N}\left[\left\langle s_j\right\rangle-\frac{\gamma}{2}(\left\langle s_{j-1}\right\rangle+\left\langle s_{j+1}\right\rangle)\right].

Assume now that the system is prepared in a translationally invariant state, i.e. \left\langle s_j\right\rangle=m for all j. The dynamics preserve this property, so we can write

\Delta m=-\frac{\alpha}{N}\left(1-\gamma\right)m.

For large times (many iterations of the Glauber dynamics) we can approximate this by

\frac{dm}{dt}=-\frac{\alpha}{N}(1-\gamma)m,

which yields

m(t)=m(0)e^{-\alpha(1-\gamma)t/N}m(0).

Therefore for T>0, \gamma<1 and the average magnetization decays exponentially in time (since we know the stationary solution is the Boltzmann distribution, the limit m(\infty)=0 checks out). For T=0, \gamma=1 and m is conserved. This is possible due to the fact that there is a path through state space of non-increasing energy from any non-ground state to either ground state. For example, imagine preparing the system in an even distribution over the states with all spins down but one. The energy of such a state may be reduced by aligning the odd spin, but with some probability, the Glauber dynamics will move the two domain walls apart, eventually annihilating them after a full circle so that the system ends up in the spin up ground state.

References

1. http://physics.bu.edu/~redner/896/spin.pdf
2. http://fig.if.usp.br/~oliveira/PRE\_67\_066101\_2003.pdf
3. http://www.columbia.edu/~ks20/stochastic-I/stochastic-I-MCII.pdf

Long-range order in a scalar field at thermal equilibrium

This calculation works through an example of a system in which long-range order (in a sense made precise below) appears in spatial dimensions d\geq 3 but not in dimensions d=1,2:

Suppose that a real scalar field u is defined on a d-dimensional cube C of side length L, with boundary condition u\vert_{\partial C}\equiv 0 and an upper bound on the frequency components of the field, as would be the case if we were trying to approximate a scalar field defined on a lattice. Let the system be governed by a Hamiltonian given by

\begin{aligned} H(u)=\frac{c}{2}\int_C\nabla u\cdot\nabla u\hspace{2pt}d^dr. \end{aligned}

At nonzero temperatures, for points x and y close to the center of the cube, the thermal average of the squared field variation has behavior varying with dimension:

\begin{aligned} \left\langle(u(x)-u(y))^2\right\rangle_\beta&\sim\left\vert x-y\right\vert\hspace{20pt}&d=1\\ &\leq \text{const}&d\geq 3 \end{aligned}

so that for dimensions greater than two, the field is roughly constant no matter how far away you look, whereas for dimension one, the thermal fluctuations destroy this long-range order. The d=2 case apparently gives logarithmic dependence on separation (see e.g. these lecture notes), but I don’t show it here.

We’ll demonstrate that this behavior exists in a few steps:

1. Mode decomposition of Hamiltonian

We can start by decomposing the scalar field into Fourier modes:

u(x)=\sum_n\hat{u}_{n}\phi_n(x)\hspace{20pt}\phi_n(x)=\prod_{i=1}^{d}\sin\left(\frac{n_i\pi x_i}{L}\right)

where n is a d-dimensional vector of integers ranging from 1 to some cutoff N. In this basis, evaluating the Hamiltonian is simple:

\begin{aligned} H(u)&=-\frac{c}{2}\int_Cu\nabla^2ud^dx=\frac{c\pi^2}{2L^2}\sum_{nm}n^2\hat{u}_n\hat{u}_m\int_C\phi_n\phi_md^dx=\frac{c\pi^2}{2L^2}\sum_{nm}n^2\hat{u}_n\hat{u}_m\left(\frac{L}{2}\right)^d\delta_{nm}\\ &=2^{1-d}c\pi^2L^{d-2}\sum_{n}n^2\hat{u}_n^2. \end{aligned}

The first equality resulted from the given Hamiltonian by integration by parts. The modes are non-interacting, which simplifies the following calculation.

2. Calculation of frequency-space correlation function

Consider the following thermal correlation function:

\left\langle\hat{u}_{m_1}\hat{u}_{m_1}\right\rangle_\beta=\frac{\left(\prod_n\int_{-\infty}^{\infty}d\hat{u}_n\right)\hat{u}_{m_1}\hat{u}_{m_1}e^{-\beta H(u)}}{\left(\prod_n\int_{-\infty}^{\infty}d\hat{u}_n\right)e^{-\beta H(u)}}.

By the symmetry of the integrand, it’s clear that the correlation function vanishes unless m_1=m_2. In this case:

\left\langle\hat{u}_{m}^2\right\rangle_\beta=\frac{\int_{-\infty}^{\infty}d\hat{u}_m\hat{u}_m^2e^{-c\pi^2L^{d-2}\beta m^2\hat{u}_m^2/2^{d+1}}}{\int_{-\infty}^{\infty}d\hat{u}_me^{-c\pi^2L^{d-2}\beta m^2\hat{u}_m^2/2^{d+1}}}=\frac{1}{2}\left(c\pi^2L^{d-2}\beta m^2/2^{d+1}\right)^{-1}=\left(\frac{2^d}{c\pi^2}\right)\beta^{-1}m^{-2}L^{2-d},

so that overall we get

\left\langle\hat{u}_{m}\hat{u}_{n}\right\rangle_\beta=\delta_{mn}\left(\frac{2^d}{c\pi^2}\right)\beta^{-1}n^{-2}L^{2-d}.

Since the factorization of the partition function is a direct consequence of the absence of mode-coupling terms in the Hamiltonian, we just had to do a single-mode calculation and then write this answer down.

3. Calculation of average squared disorder

Expanding the squared difference between field values at points x and y in terms of the Fourier modes, we get

\begin{aligned} (u(x)-u(y))^2&=\left(\sum_n\hat{u}_n\left(\phi_n(x)-\phi_n(y)\right)\right)^2\\ &=\sum_{nm}\hat{u}_n\hat{u}_m\left(\phi_n(x)-\phi_n(y)\right)\left(\phi_m(x)-\phi_m(y)\right). \end{aligned}

Taking the expectation using the frequency-space correlation function:

\begin{aligned} \left\langle(u(x)-u(y))^2\right\rangle_\beta&=\sum_{nm}\left(\phi_n(x)-\phi_n(y)\right)\left(\phi_m(x)-\phi_m(y)\right)\delta_{mn}\left(\frac{2^d}{c\pi^2}\right)\beta^{-1}n^{-2}L^{2-d}\\ &=\left(\frac{2^d}{c\pi^2}\right)\beta^{-1}L^{2-d}\sum_{n}\frac{\left(\phi_n(x)-\phi_n(y)\right)^2}{n^2} \end{aligned}

Note that up to this point, everything has been exact (for the continuum theory defined in the problem statement, which itself could be used as an approximation of a lattice system).

4. Approximations for different values of d

For d=1:
\begin{aligned} \left\langle(u(x)-u(y))^2\right\rangle_\beta&=\left(\frac{2^d}{c\pi^2}\right)\beta^{-1}L^{2-d}\sum_{n=1}^N\frac{\left(\sin\left(\frac{n\pi x}{L}\right)-\sin\left(\frac{n\pi y}{L}\right)\right)^2}{n^2} \end{aligned}

For very large L and x, y close to L/2 on the scale of L, the summand varies slowly in n, so we can approximate the sum by an integral. Since for sufficiently large N, which assuming a fixed frequency cutoff is achieved by taking L large enough, the integrand is heavily suppressed, we can extend the upper limit to infinity without too much error and evaluate the resulting integral (I did it in Mathematica):

\begin{aligned} \left\langle(u(x)-u(y))^2\right\rangle_\beta&\approx\left(\frac{2^d}{c\pi^2}\right)\beta^{-1}L^{2-d}\int_0^\infty\frac{\left(\sin\left(\frac{n\pi x}{L}\right)-\sin\left(\frac{n\pi y}{L}\right)\right)^2}{n^2}dn\\ &=\left(\frac{2}{L}\right)^{d-1}c^{-1}\beta^{-1}\left\vert x-y\right\vert \end{aligned}

For d\geq 3 the cutoff is significant, as the integrand grows with increasing n (due to the factor of n^{d-1} in the measure). The largest the numerator of the integrand can be is four, since it’s the square of a difference between products of sines. Fixing the integrand to its maximum value and integrating up to the cutoff frequency gives an upper bound independent of x and y.

A Holevo quantity inequality

For some reason, I wanted to know the following fact at some point.

Let (p_i,\rho_i) be an ensemble of states of a bipartite system AB. For \chi the Holevo information, we have \chi_{AB}\leq\chi_A+\chi_B+\bar{I} where \bar{I}=\sum_ip_iI_i is the expected value of the quantum mutual information I_i=S(\rho_i^A)+S(\rho_i^B)-S(\rho_i^{AB}).

\begin{aligned} \chi_{AB}&=S\left(\sum_{i}p_i\rho_i\right)-\sum_{i}p_iS(\rho_i)\\ &\leq S\left(\sum_{i}p_i\rho^A_i\right)+S\left(\sum_{i}p_i\rho^B_i\right)-\sum_{i}p_iS(\rho_i)\\ &= S\left(\sum_{i}p_i\rho^A_i\right)+\sum_{i}p_iS(\rho^A_i)-\sum_{i}p_iS(\rho^A_i)+S\left(\sum_{i}p_i\rho^B_i\right)-\sum_{i}p_iS(\rho_i)\\ &= \chi_A+\chi_B+\sum_{i}p_iS(\rho^A_i)+\sum_{i}p_iS(\rho^B_i)-\sum_{i}p_iS(\rho_i)\\ &= \chi_A+\chi_B+\sum_{i}p_i\left(S(\rho^A_i)+S(\rho^B_i)-S(\rho_i)\right)\\ &= \chi_A+\chi_B+\sum_{i}p_iI_i  \end{aligned}

Since the Holevo information gives an upper bound for the mutual information between the random variable X\sim (p_i) and the outcome of any measurement that can be made on the received state, setting \chi_A=\chi_B=0 we see that \bar{I} may be meaningfully taken as an upper bound for the amount of hidden information.