Elliptigon

Quaternions: the unspoken heroes

Ashwath Ravindran — Mon, 04 Nov 2024 18:02:18 GMT

If you've ever played a video game or watched a smooth 3D animation, you've likely enjoyed the benefits of quaternions without realizing it. These mathematical entities make all those rotations in 3D space feel natural. Consider your character in a game—whether it's turning their head or changing direction mid-jump, all those fluid movements are powered by some interesting math.

Quaternions are a 4-dimensional number system represented as $a + b\hat{i} + c\hat{j} + d\hat{k}$. These can help you avoid problems like gimbal lock, where the system loses its ability to rotate correctly.

We'll explain more about gimbal lock later, but imagine playing a game where the camera suddenly freezes when you tilt your camera beyond a certain angle. This would lead to either a loss in one axis of rotation or a complete freeze-up of the camera. Situations like these would completely ruin your experience. Quaternions prevent such issues, keeping everything fluid and smooth. They ensure your camera does not go kaput when you are in intense situations.

This is just one example of how quaternions operate behind the scenes in our daily lives, which is unknown to most people.

If you’re into math, quaternions are a fascinating topic. They help you think about the combination of numbers and rotations in ways that are not what we usually think of but have real applications like game development, animation, aerospace engineering, and robotics.

Quaternions are cool and they extend the idea of complex numbers to 4D space. They are not just some random tool we use because they’re convenient for understanding problems regarding rotation, stability, and control, but they also represent a deeper mathematical structure that connects algebra, geometry, and even physics. For instance:

Quaternion multiplication is non-commutative.

In most number systems order of multiplication does not matter:

$$ab=ba$$

However, with quaternions $\hat{i}\hat{j}=\hat{k}$ but $\hat{j}\hat{i}=-\hat{k}$. This is termed non-commutative and this is what sets them apart from real and complex numbers.

So, what are quaternions?

Quaternions are popularly defined as four-dimensional numbers or a four-dimensional extension of complex numbers. But, what does this actually mean? A dimension is essentially the number of unique values you would need to uniquely specify something in a system. An example to illustrate what this means is, let’s say you have a road full of potholes and you want to obtain a sort of address for each pothole. Then you will need only two dimensions- horizontal and vertical distance from a reference point.

Another important thing to note - when we talk about dimensions while relating them to real-life examples, is that dimensions are the minimum number of values required to specify the system in question. In the pothole example, we can add plenty of redundant values like the left half of its area, the top left sector and so on, however, we only need two distinct values. This is similar to how we use multiple parameters to describe an address - like the house number, street, area code and so on, however, we really only need the latitude and the longitude.

And an address must be unique right? In other words, one address should only correspond to one location. You can’t have two houses with the same address.

This is the basic idea. We can extend this to mathematical objects, as the number of linearly independent values necessary to specify a point. Linearly independent means you can’t get one value from the other. For example, a line needs only $1$ value- the length from a reference point to specify each point on it. So, it is one-dimensional (1D). The world we live in can be described as a three-dimensional (3D) space- as we need the length, width and height from a reference point to specify a location.

All right, now let's apply the idea of dimensions to numbers. We will be using the former definition of dimensions for this purpose. For a real number, all the information about it is contained in that one value like $2$ or $\pi$ or $e$ that we use to specify it. For a complex number however, two real numbers are required to specify it which is what makes it 2 dimensional (2D). Look below:

$$ a + bi $$

where, $a,b \in \mathbb{R}$ and $i$ is the imaginary unit $\sqrt{-1}$. Here $a$ is called the real part while $b$ is called the imaginary part of the complex number.

A complex number can in fact be represented on a plane, called the argand plane. The real part is represented on the x-axis and the imaginary part on the y-axis.

Source: E.O. (https://math.stackexchange.com/users/18873/e-o), Argand Diagram - Quadrants help, URL (version: 2012-06-05): https://math.stackexchange.com/q/154025

Understanding the quaternion group

A group is a set equipped with an operation which obeys a few unique rules that allow us to describe symmetries. Analyzing the structure of these groups can be very important to study the object we are applying it to as it can help us characterize and classify it. Studying groups can also help us make connections between different parts of math and science.

The set of basis elements in quaternions $\{1, -1, \hat{i}, -\hat{i}, \hat{j}, -\hat{j}, \hat{k}, -\hat{k}\}$ actually form a group called $\mathbb{Q}_{8}$, with the group operation as multiplication. But, what are these actually? Are they just some random abstract objects? Well, they are more than that, and we have a trick up our sleeve to show you what they mean practically. Matrices! Yes, that’s right quaternions can be represented as matrices. They are $2 \times 2$ complex matrices, this is analogous to how complex numbers are represented as $2 \times 2$ real matrices.

$$ q=\left(\begin{array}{cc}a+i d & -b-i c \\b-i c & a-i d\end{array}\right) $$

While switching to this matrix representation we have to make sure that the exact properties are retained, for instance, the algebra defines how $\hat{i}, \hat{j}$ and $\hat{k}$ interact and should thus be preserved. So, accordingly, the elements of the basis set of the $\mathbb Q_8$ group represented in terms of matrices are given as follows:

$$ 1=\left(\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right), \quad \hat{i}=\left(\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\right), \quad \hat{j}=\left(\begin{array}{cc} 0 & -i \\ -i & 0 \end{array}\right), \quad \hat{k}=\left(\begin{array}{cc} i & 0 \\ 0 & -i \end{array}\right) $$

However, these matrices can also be structured differently to incorporate something that physics aficionados will definitely be familiar with: the Pauli matrices $\sigma_x, \sigma_y$ and $\sigma_z$. The quaternions can take the form of the following matrices:

$$ \begin{aligned}& \sigma_1=\sigma_x=\left(\begin{array}{ll}0 & 1 \\1 & 0\end{array}\right) \\& \sigma_2=\sigma_y=\left(\begin{array}{cc}0 & -i \\i & 0\end{array}\right) \\& \sigma_3=\sigma_z=\left(\begin{array}{cc}1 & 0 \\0 & -1\end{array}\right)\end{aligned} $$

Therefore our $\hat{i}, \hat{j} ,\hat{k}$ matrices can be changed as:

$$ \begin{aligned} & \hat{i}= i \sigma_z \\ & \hat{j} = -i \sigma_y \\ & \hat{k} = -i \sigma_x \end{aligned} $$

if you look carefully the quaternion algebra is still preserved (remember the Pauli matrix multiplication rules). Example: $\hat{i} \times \hat{j} = (i \times -i) \times (\sigma_z \times \sigma_y) = 1 \times -i\sigma_x = -i\sigma_x = \hat{k}$.

This is indeed a cool link between the mathematics of spin and quaternions!

Now that we know what these things are, lets verify that they form a group by checking if they obey the rules i.e. group axioms. But, before that let’s motivate why these rules are set to be so. We told you earlier that groups are related to symmetries. Thus, these axioms can be thought of as properties of symmetries:

Closure: Symmetries leave the object unchanged, so doing one symmetry after the other also obviously leaves the object unchanged. This is what closure is all about, composing objects within a group stays within the group. The group is like a box with the lid closed.
Associativity: Combining two symmetries first and then combining the third with the result of that is the same as combining the first with the result from combining the second and third. Like imagine a cow looking to your left which we can think of as a spear with a dot on its head pointing to your left, now flip the spear about the vertical axis and rotate it by 90 degrees clockwise, then rotate the result of that by 30 degrees. Then, restart from the beginning but this time flip it first and then do a 120 degree rotation and you will end up with the same thing.
Identity: Its the element that does nothing when it combines with any other element. Its like that one teammate you have. They are someone who is just there but does nothing to change the outcome of the anything. Obviously doing nothing is also a symmetry. So, the group must contain an identity element.
Inverses: Inverses are elements that ensure that every action can be undone, adding reversibility and completeness. Undoing an action has to be a symmetry as well, because if a symmetry leaves a certain aspect of an object unchanged then if that is changed by the inverse then it can’t be the inverse. Therefore, every element in a group has a unique undo button. The "ctrl + z" or inverse that takes you back to the identity element. For any element $a$, there’s another element $a^{-1}$ that, when combined, returns you to the identity.

Now that you guys have a concrete understanding of the group axioms, let us show you a cool way to visualize this group. This is done using a Cayley graph.

Source: Sullivan.t.j at en.wikipedia derivative work: M0tty, CC BY-SA 3.0, via Wikimedia Commons

A Cayley graph is a graphical representation of a group where each node represents a group element, and the arrows represent the operation of composition by an element of the group.

Quaternions vs Matrices and Euler angles

When it comes to representing rotations in 3D space, Euler angles, rotation matrices, and quaternions have advantages and disadvantages of their own

Euler angles are the easiest to understand because they break rotations into three separate components: pitch (up/down), yaw (left/right), and roll (tilting). This simplicity makes Euler angles easy to work with and visualize, especially in applications like flight simulators or camera controls. However, Euler angles suffer from a major limitation, rotating about one axis affects the position of the other axes. What we mean is illustrated in the below diagram:

Source: Mathspoetry CC BY-SA 3.0 via Wikimedia Commons

Let’s say the red, blue, and green rings represent the different axes(x, y, and z). Imagine that the rings are connected. Now, rotating one ring also changes the position of the other rings, and what happens when two of the rings align? Now, you have effectively lost an axis of rotation. Look at the gif below to see what we mean:

Here, after the green and pink axes align you can only rotate about two axes instead of three. Even if a gimbal lock doesn’t occur, changing axes is a problem, because for example if we want to rotate my object about the x-axis, it will rotate about some other random axis.

However, rotation matrices are a more reliable option for systems that need accurate and continuous rotations since they solve the issue of a gimbal lock. A rotation matrix is a $3 \times 3$ matrix that may be used to represent any 3D rotation. It can also be used to handle scaling and translation in addition to rotations when paired with other matrices, which makes it useful for applications such as computer graphics and physics simulations. Rotation matrices do have certain drawbacks, though. To start, they are inefficient when compared to quaternions, which only need four parameters to represent a rotation whereas rotation matrices require nine. Multiplying rotation matrices also takes longer to do for a computer compared to multiplying quaternions.

Rotations using quaternions

Now the question is how do these quaternions produce rotations? First of all what is a rotation? A rotation is a transformation that obeys three rules, and if you think about these with respect to rotations you are used to in daily life, it will make it very obvious.

It must be a linear transformation
Lengths and distances must be preserved: Imagine two arrows on a globe, rotating the globe preserves the lengths of both arrows and the distances between them.
Orientations must be preserved: Using the same example, look at the fact that the first arrow is still on the left/right(however you imagined it) of the second.

The set of matrices which obey these conditions in three dimensions for Euclidean vectors form a group called $SO(3)$. For matrices the conditions are that $MM^T = I$ and $det(M) = 1$ where M is a matrix. This is just a mathematical representations of 2 and 3 above for matrices.

Okay, lets get back to quaternions, for them rotations are described by $qvq^{-1}$. Here $q$ is a unit quaternion i.e. a quaternion of unit length,

$$ |q| = \sqrt{a^2 + b^2 + c^2 + d^2} =1 $$

and $q^{-1}$ is its inverse given by,

$$ q^{-1} = \frac{a-b\hat{i}-c\hat{j}-d\hat{k}}{a^2+b^2+c^2+d^2} = \frac{q^{*}}{|q|^2} $$

While $v$ is what is called a pure quaternion, this is basically a quaternion with real part zero i.e. $a = 0$. This is literally how we bring the 4D to quaternions to our 3D world, because a pure quaternion can be thought of as a vector in $\mathbb{R}^3$ instead of $\mathbb{R}^4$, and conjugating it with a unit quaternions stays within the space.

If you noticed carefully, we still didn’t show you why this is a rotation, we just told you that it is. Here is why, conjugation by a unit quaternion always preserves distances(isometry), so consequently sandwiching a pure quaternion with two unit quaternions preserves distances. This sandwiching also preserves lengths because of the multiplicative absolute value property of quaternions($|q_1q_2| = |q_1||q_2|, \text{ so } |qvq^{-1}| = |q||v||q^{-1}|$). Finally, orientations are preserved because unit quaternions have determinant 1 ($det(q) = |q|^2 = 1$). So there you have it all properties of rotation are satisfied.

Many of the problems caused by rotation matrices and Euler angles are resolved by quaternions. They can describe any 3D rotation with just four real numbers and provide smooth rotations without the problem of a gimbal lock. Quaternions are perfect for real-time applications like video games, animation, and spacecraft navigation since they are also far less prone to numerical drift and are more computationally efficient than matrices. Quaternions have advantages, but because they contain four dimensions, they can be challenging to intuitively see and understand. It takes more computations to extract useful angles from a quaternion, such as pitch, yaw, and roll, which makes direct manipulation of these angles less convenient. However, quaternions are more efficient and stable when it comes to handling complex rotations hence they are more widely used in most applications.

In summary

Quaternions are a cool and useful mathematical idea in that they are 4 dimensional numbers, you have the whole thing about rotations, its structure as a group, but at the same time it is also useful in a lot of ways like we mentioned in this article. Their purpose however doesn’t stop in being useful to describe 3-dimensional rotations. They present a unified perspective in certain areas of physics like quantum mechanics, relativity and in the standard model of particle physics. For instance, in special relativity quaternions can be used to provide a compact alternative form of space-time transformations.

The standard model can be re-expressed entirely using octonions, which are 8D extensions of quaternions.

Our hope is that by the end of this article you have developed an appreciation for the beautiful math of quaternions and continue to carry forward this appreciation to other areas of math as well.

When Born-Oppenheimer Fails

Pranay Venkatesh — Sat, 24 Apr 2021 22:00:00 GMT

Prelude : Quantum Chemistry is hard

Quantum mechanics hinges itself on a beautiful equation that derives the possible states of a microscopic system : the Schrodinger equation.

$$\hat{H} \psi = E \psi$$

This "state" indicates the manner in which the probability density of a system is distributed. Specifically, the Schrodinger equation derives energy states, which are wavefunctions obtained when we're interested in measuring energy. The energy "operator", which acts on the wavefunction to derive energy states is formulated based on the conditions in which we release our system. Operator equations such as these are typically partial differential equations, which are differential equations having many variables. To solve these equations, we try to separate the variables and create individual single-variable equations. For each new particle added to the system, 3 new variables are added to the mix, one for each position co-ordinate. Furthermore, the form of the energy operator becomes significantly more complex, adding a new kinetic energy term and several potential energy terms to the mix.

$$ \hat{H} =\frac{-1}{2} \sum_{n} \nabla ^{2}_{n} + \frac{-1}{2} \sum_{\alpha} \nabla ^{2}_{\alpha} + \sum_{n} \sum_{m \neq n} \frac{1}{r_{mn}} + \sum_{n} \sum_{\alpha} \frac{Z_{\alpha}}{r_{\alpha n}} + \sum_{\alpha} \sum_{\beta \neq \alpha} \frac{Z_{\alpha} Z_{\beta}}{r_{\alpha \beta}}$$

Quantum chemistry applies the great machinery of quantum mechanics to atoms and molecules, which are complex many-particle systems. Each atom that comprises a molecule has a nucleus and several electrons. Each particle and each interaction between particles contributes a term to the energy operator. Any attempts made to solve such a complicated Schrodinger equation would be futile and meaningless since small perturbations to the initial conditions send us back to square one. To simplify this problem enormously, Max Born and J Robert Oppenheimer treated the system as though the motions of electrons are independent of the motion of nuclei (removing the second summation of the above ugly expression), the justification being that nuclei are so heavy and so slow compared to electrons that their motions would barely alter the wavefunctions of electrons. In effect, the wavefunctions of electrons and nuclei are treated as separable, making it easier to solve the equation.

$$ \hat{H}_{electronic} \psi_{electronic} = E_{electronic} \psi_{electronic}$$

The Born-Oppenheimer approximation (also called the Adiabatic Model) is arguably the best mathematical approximation that scientists have come up with for molecules. Most quantitative theories in chemistry and solid state physics that deal with bonding and formation of molecules use the Born-Oppenheimer approximation and match experimental results with striking accuracy. But, of course that's boring. Let's instead talk about rebels that disobey this rule and try to think about why.

Case 1 : Carbon Nano-Tubes

Certain kinds of lattices and solids fail to conform to the Born-Oppenheimer Approximation. This is because they have a short vibrational period of oscillation (i.e. a high frequency of vibration). They vibrate fast enough for the nuclear motion to start becoming significant enough for the Born-Oppenheimer approximation to start falling apart. This phenomena was first noticed by Walter Kohn in 1959 and is hence referred to as a Kohn Anomaly.

A collective excitation in a lattice (from Wikipedia)

Carbon Nanotubes are one such example. They are small (their radii measure in nanometers) cylindrical tubes comprised of carbon atoms. They have a short period of oscillation and a long relaxation time (i.e. time taken between successive collisions of electrons), which makes them ideal candidates to showcase Kohn Anomalies. Carbon Nanotubes have a wide range of applications. Physicists want to use them to verify their hypotheses on fundamental phenomena such as spin-orbit coupling and engineers enjoy messing around with any material that has semiconductor properties. Hence it becomes important to develop models that don't conform to the Born-Oppenheimer Approximation.

Carbon nanotube illustration (from Britannica)

Case 2 : Our Eyes

Once the chemical process of vision was identified to be the conversion of rhodopsin's cis retinal to an all trans form, chemists, biologists, physicists and engineers became interested in the peculiar dynamics of the underlying process. The photochemical process of rhodopsin excitation is an ultrafast conversion of the photons' energy into chemical energy. These processes are non-adiabatic, they don't agree with the assumptions made in the Born-Oppenheimer approximation.

The isomerisation process involved in vision (from Photobiology)

Leaving us...where?

Interesting case studies like these let us know that no matter what approximations we make, nature always has a unique system in store for us that can defy expectations and showcase unusual results. Scientists have come up with interesting models and schemes to tackle the non Born-Oppenheimer problem (a discussion of which would be far too theoretical for the purposes of this article), however it is interesting to note that they too use a good amount of approximation and simplification leaving room for the discovery of more interesting anomalies and further refinement of the underlying theorem.

As the statistician George Box once said, "All models are wrong, but some are useful". It is a sobering thought that each theory and approximation we come up with can be quickly defied by nature, yet it is astonishing that they work for most cases.

References

Breakdown of BO approximation in SWCNTs : https://arxiv.org/pdf/0901.2947.pdf

Non-adiabatic dynamics of vision : https://pubmed.ncbi.nlm.nih.gov/20864998/

Cover image taken from an article on breakdown of Born-Oppenheimer approximation in fluorine + deuterium reaction : https://science.sciencemag.org/content/317/5841/1061

SUSY: The search for a potential partner

Rishi Kumar — Sat, 27 Mar 2021 22:00:00 GMT

Why SUSY? Why not just old school Quantum Mechanics?

Humans are innately lazy beings. Well, believe it or not, physicists are human beings too.

Modern day science has slowly moved away from trying to invent new things, to refining theories explaining what we have already discovered. The two main objectives right now (and in recent history) for most Physicists are- Accuracy and Efficiency. We either try to formulate a method to perform calculations with high accuracy and precision, which often comes with the cost of complexity, or we try to find an approximation that is very easily solvable and is reasonably close to the actual value, but at times not accurate.

Let us take the Hydrogen atom problem in Quantum mechanics for example. The conventional method of solving for the wavefunction of the electron is pretty sufficient for most physicists. You use the 3-D radial Schrodinger wave equation, substitute the potential for the electron, do a lot of algebra and voila, you obtain the wavefunction of the electron, and with that pretty much everything else you need to know about the quantum system.

Simple, right? Yes, but actually no. The substitution part, yes. The algebra part? Maybe not! It involves a lot of algebraic manipulation, partial differential equations, Laguerre Polynomials and normalization, that might appear simple while taking a glance (which you will in the next section), but can be a terrible pain when you actually attempt the problem (this is true for most problems in Physics, by the way!). What if I told you, that there is a method, that is completely legit, no shady manipulation of terms or crazy assumptions (back off Engineers, this is a Physics article) that arrives at the same result, but just in three steps? Maybe it is not enough to make me (or you) run out of the shower midway shouting "Eureka!" out of my busy apartment into the even busier roads, but is still a cool concept to take a look at.

This article assumes that you know the basics. Not just any basics, but basics of quantum mechanics. The words Schrodinger equation, wavefunctions, potentials, operators and complex analysis should not invoke fear in you. If you already felt a little queasy reading the second paragraph, I recommend you try to do a bit of basic QM training before this read (don't say I did not warn you!)

The OriGinal way of solving the Hydrogen Atom problem

Let us get real technical now (like how most Physics books give an ABC introduction of differentiation and jump into quantum loop gravity), taking the Radial Schrodinger wave equation we have,

$$ -\frac{\hbar^2}{2m}\frac{\partial^2 u}{\partial r^2} + \left[V(r) + \frac{\hbar^2}{2m}\frac{l(l+1)}{r^2}\right]u = Eu $$

Okay, do not fear, I will link all the websites that you can visit to brush up on your rusty quantum mech. To begin from scratch will possibly convert this article into a thesis, which neither of us would like! So if you feel stranded at this point, scroll down below, and take a few minutes to see how we obtain the radial Schrodinger wave equation, and get back here (ignore all your other YouTube recommendations!).

Now, assuming you are enlightened with advanced quantum mechanics, we should know that the potential of an electron, a negative charge orbiting a central positive and stationary nucleus is,

$$V(r)=-\frac{e^2}{r}$$

Now, we simply substitute this into the radial wave equation to get,

$$\frac{\hbar^2}{2m}\frac{\partial^2 u}{\partial r^2} + \left[-\frac{e^2}{r} + \frac{\hbar^2}{2m}\frac{l(l+1)}{r^2}\right]u = Eu$$

Remember when I previously told you that there is a LOT of algebraic manipulation and calculus involved? This is where it begins. But do not worry, I am going to completely (yes, completely) breeze past the scary math and arrive at the results, or in other words, the allowed eigenvalues of the above radial wave equation. If your inner physicist demands an explicit derivation (or you decide to hate yourself), the links are down below.

Now then, the allowed eigenvalues of the radial wave equation of the electron are

$$ E_n=-\frac{me^2}{2\hbar^2n^2}=-\alpha^2mc^2\left(\frac{1}{2n^2}\right)=-13.6eV/n^2 $$

Where $n=1,2,3...$ and,

$$ \alpha\equiv\frac{e^2}{\hbar c}=\frac{1}{137.06}$$

Why are these the only allowed eigenvalues? Its because these are the only values of which you obtain normalisable, or in other words, any sensible solution from the equation. Non-normalisable solutions just imply non-unity probabilities, which is just bizarre (for now). The $\alpha$ is called the fine structure constant.

We have obtained the normalized solutions, now we should get the normalized wavefunction. By this point, we have already obtained the ground state energy of the electron to be -13.6 eV, and a way to calculate the n-excited states. This should be enough, but to stress my point of this method being complex, I am going to introduce this behemoth equation (out of thin air),

$$\psi_{n,l,m_l}(r,\theta\phi)=\left\{\left(\frac{2}{na}\right)^3\frac{(n-l-1)!}{2n[(n+l)!]^3}\right\}^{1/2}\\ e^r/na\left(\frac{2r}{na}\right)^l L^{2l+1}_{n-l-1}\left(\frac{2r}{na}\right)Y^{m_l}_l(\theta,\phi)$$

What is this? This is the normalized wavefunction of the hydrogen atom, where the $Y_{n,l,m_l}(r,\theta,\phi)$ are called spherical harmonics, and the $ L^{2l+1}_{n-l-1}\left(\frac{2r}{na}\right)$ is called the associated Laguerre polynomial, which is explicitly given by,

$$L^p_{q-p}(z)\equiv (-1)^p\left(\frac{d}{dz}\right)^p \left[e^z \left(\frac{d}{dz}\right)^q (e^{-z} z^q)\right]$$

Where it is a solution for the second order linear differential equation of the form

$$xy^n+(1-x)y'+ny=0$$

Okay, time for the inner physicist to calm down. By now, you should have got my point. This method is hard. If you still do not feel this way, congratulations, you are fit to be a quantum physicist and suffer for the rest of your life. Now, let us look at a much simpler (relatively) way of solving the same problem, using an amazing technique, called SUSY.

Who (or more fittingly What) is SUSY?

First up, SUSY, is not a name of any female physicist (or model). It is an abbreviation for the term Supersymmetry (clickbait? I think not. Its a physics article!). Supersymmetry is a mathematical concept which arose from theoretical arguments and led to an extension of the Standard Model (SM) as an attempt to unify the forces of nature. It is a symmetry which relates fermions (half integer spin) and bosons (integer spin) by transforming fundamental particles into superpartners with the same mass and a difference of half spin (finally, the title makes sense!).

Supersymmetry in Particle Physics

The above image is focused on supersymmetry in particle physics, but for SUSY in quantum mechanics we try to determine partner potentials and factorize our Hamiltonians, to make our lives simpler. But why, you may ask. Its simple to deduce that the Hamiltonian in the Schrodinger wave equation,

$$H_1=-\frac{\hbar^2}{2m}\frac{d^2}{dx^2}+V_1(x)$$

is a second order differential equation. And everyone knows that second order differential equations (and well anything else) are much harder to solve than first order differential equations. We first assume that the wavefunction, $\psi_0$ is . This means that $\psi_0$ vanishes at $x=\pm\infty$. Using this simple assumption, we rewrite the Schrodinger Equation for $\psi_0$ as,

$$0=\frac{\hbar^2}{2m}\psi_0\frac{d^2}{dx^2}+V_1(x)\psi_0$$

Now, we know that the ground state is nodeless, we can solve the Schrodinger Equation for the potential,

$$V_1(x)=\frac{\hbar^2}{2m}\frac{\psi_0^{''}(x)}{\psi_0(x)}$$

From here, we begin to factorize the Hamiltonian as follows,

$$H=A^\dagger A$$

Where,

$$ A=\frac{\hbar}{\sqrt{2m}}\frac{d}{dx}W(x) $$

$$A^\dagger=-\frac{\hbar}{\sqrt{2m}}\frac{d}{dx}W(x)$$

Where $W(x)$ is referred to as the superpotential (being frank, there is nothing super about this potential, but what else would you call it).

We know our regular potential, and now we have obtained our superpotential. Next, we try to establish the relationship between these two potentials. To do this, we simply substitute the above equation into our Hamiltonian,

$$H\psi(x)=\left(-\frac{\hbar}{\sqrt{2m}}\frac{d}{dx}W(x)\right)\left(\frac{\hbar}{\sqrt{2m}}\frac{d}{dx}W(x)\right) \psi(x)$$

Simplifying this, we get,

$$V(x)=W^2-\frac{\hbar}{\sqrt{2m}}W'(x)$$

This is called the Riccati equation (it is probably named after some Italian mathematician named Riccati). Note that this equation is in the desired first-order. No nasty double differentiation. Now the partner Hamiltonian is obtained by doing the same calculations, but for the Hamiltonian as,

$$H_2=AA^\dagger$$

and using similar calculations,

$$V_2(x)=W^2+\frac{\hbar}{\sqrt{2m}}W'(x)$$

This is called the supersymmetric partner potential. Very simple so far, right?

The relationship between Partner Hamiltonians

Using SUSY to solve the Hydrogen atom problem

At this point, you must have called my bluff of this entire SUSY QM method being exponentially easier for solving problems in Quantum mechanics. But here is the catch- the previous section was for laying the foundations of supersymmetry. It is equivalent to establishing the Schrodinger equation in quantum mechanics. Almost everyone studying physics can recall the Schrodinger equation, but very few can actually "derive" it, because it is constructed almost purely by heuristic arguments, and not much of mathematical basis.

The previous section was just building the tools from scratch, a polished screwdriver being made from iron ore. Now that you have the screwdriver in hand, you can screw conventional quantum mechanics for solving problems.

Again, substituting for the Coulomb potential in the Schrodinger wave equation,

$$\frac{\hbar^2}{2m}\frac{\partial^2 u}{\partial r^2} + \left[V(r) + \frac{\hbar^2}{2m}\frac{l(l+1)}{r^2}\right]u(r) = E_0u(r)$$

This is essentially the same as we saw before, just with slightly different notation. Now, we get something called the shifted potential, which is given by,

$$\tilde{V}(r)= \left[-\frac{1}{4}\frac{e^2}{4\pi\epsilon_0}\right]\frac{1}{r}+\left[\frac{\hbar^2l(1+1)}{2m}\right]\frac{1}{r^2}-E_0$$

Now, using our Riccati equation and finding the superpotential of the form,

$$\tilde{V}(r)=W(r)^2-\frac{\hbar}{\sqrt{2m}}W'(r)$$

Here, we make the ansatz (essentially a fancy assumption of the form of an unknown function),

$$W(r)=C-\frac{D}{r}$$

Comparing this ansatz with the previous equations, and doing some simplification,

$$W(r)=\frac{\sqrt{2m}}{\hbar}\frac{e^2}{2\cdot 4\pi\epsilon_0(l+1)}-\frac{\left(\frac{\hbar}{2m}(l+1)\right)}{r}$$

Compare the radial equation with the equation above. We can see that the ground state energy, $E_0$ is simply $-C^2$, and written explicitly,

$$E_0=-C^2=-\frac{e^4}{4\cdot 16\pi^2\epsilon_0^2(l+1)^2}\frac{2m}{\hbar^2}=-2.18\cdot10^{-18}\approx -13.6eV$$

We see that in a few steps, we have successfully approximated the ground state energy of the Hydrogen atom. Does it stop here? Not yet. We can then now find the partner potential, $V_2$ that we found the formulae for in the previous section,

$$V_2(r)=\left[-\frac{1}{4}\frac{e^2}{\pi\epsilon_0}\right]\frac{1}{r}+\left[\frac{\hbar^2(l+1)(l+2)}{2m}\right]\frac{1}{r^2}+\left[\frac{e^4m}{32\pi^2\hbar^2\epsilon_0^2(l+1)^2}\right]$$

So far, we have obtained our superpotential, its partner potential and calculated the ground state energy. What more do we need? Before asking that question, you should ask what the exact use of the partner potential is. Remember when I mentioned that the superpotential and the partner potentials have a relationship (that's why they are called "partner" potentials, or else they should be simply named "crush" potentials!)? Now that relationship is characterized by the shape invariance, or in mathematical words,

$$V_2(x;a_1)=V_1(x;a_2)+R(a_1)$$

This essentially means that they both are related by the factor $R(a_1)$, and that their graphs look similar in shape, but change by a magnitude characterized by the remainder factor.

Visualizing Shape Invariant Potentials (SIPs)

That makes much more sense, doesn't it? Now, we can use this to calculate the exact remainder between the two potentials. The relationship between the two potentials, numerically is,

$$a_2=f(a_1)\Rightarrow f(l)=l+1$$

There is the factor of $l+1$, from which we can simply add and compare the superpotential and its partner potential to find the remainder $R(l)$

$$R(l)=\frac{e^4m(2l+3)}{32\pi^2\hbar^2\epsilon_0^2(l+1)^2(l+2)^2}$$

Here comes the cool part- this remainder is the energy gap between the ground state and the first excited state, and the first to second, and so on. All you need to do to get every other excited state in the spectra of the Hydrogen atom, is to add $n$ of the remainder $R(l)$ to the ground state energy $E_0$ to get the $n$th excited state, $E_n$. So for the first excited state $E_1$,

$$E_1=\frac{e^4m}{32\pi^2\hbar^2\epsilon_0^2(l+1)^2}+\frac{e^4m(2l+3)}{32\pi^2\hbar^2\epsilon_0^2(l+1)^2(l+2)^2}$$

Now generalizing this to $n$th excited states,

$$E_n=E_0+\sum_{i=1}^{n}\frac{e^4m((2l+n-1)+3)}{32\pi^2\hbar^2\epsilon_0^2(l+n)^2(l+n+1)^2}$$

Taking $l=1$,

$$E_n=\frac{e^4m}{32\pi^2\hbar^2\epsilon_0^2(n+1)^2}$$

This is the known formula for the energy states of the Hydrogen atom (not so surprisingly).

Finally, the conclusion!

Okay, so that was an exhausting read, made even worse with the bad jokes and puns sprinkled throughout. Well, it was my best attempt to make something so tedious like supersymmetric quantum mechanics to be as readable as possible. Whether it was a success or a miserable failed attempt is with the reader (although if you reached this point in the article, I would call that as a win).

Summarizing, we briefly glanced at the conventional way to solve the Hydrogen atom problem in quantum mechanics, and I stress briefly, because around 90% of the derivations were omitted, for obvious reasons (the section is taken as an entire chapter in Griffiths). If you understood what we were doing, you must have definitely appreciated how efficient AND accurate the supersymmetry method is in this case, and for many more problems in quantum mechanics.

Do keep in mind that supersymmetry in quantum mechanics is just the icing of the cake, SUSY's main applications is in the field of particle physics, where it is a major contender for the Grand Unification Theory. It is not entirely foolproof though, as our hypothesis demands that the previously existing symmetry has been broken, had it existed before or not, is still a mystery. This is the same problem that affects most theories in Physics- no theory entirely completes the big puzzle, because the puzzle is nearly infinite, and currently we do not know where its boundaries are, let alone try to finish the puzzle. Every time we discover something, we become more confused than we were before. Maybe the law of entropy also applies for confusion- always seems to increase in the field of Physics!

But maybe that is why Physics is so beautiful, right?

Acknowledgements

Firstly, I would like to thank Pugazh (no, he did not pay me for this) for constantly threatening me to complete this article. I definitely owe a lot to my college professors, especially Prof. Joseph Prabagar (you might have seen his name a few times in previous articles, yes, he is awesome) for igniting the curiosity of quantum mechanics in the boring old me.

References

David J. Griffiths, Darrell F. Schroeter - Introduction to Quantum Mechanics (2018)
Fred Cooper, Avinash Khare, and Udaz Sukhatme. Supersymmetry in Quantum Mechanics. World Scientific Publishing Co. Pte. Ltd., 2001.
R. Shankar - Principles of Quantum Mechanics (2013, Springer)
Jakob Switchenberg- (Undergraduate Lecture notes in Physics) Physics from Symmetry (Second Edition, Springer)

Links for study

How much do you really know?

Pugazharasu Anancia Devaneyan — Sat, 20 Mar 2021 22:00:00 GMT

Raise your hand if you know this guy:

From the New Yorker

If you don't, shame on you. His name is Claude E. Shanon. He's a genius.

The first time I was introduced to his work, it lit my mind on fire.

Because Claude Shanon did something incredible–he defined what information is.

And I'm not talking about some vague philosophical definition involving the mind, thought, perception, and the meaning of life. He literally wrote it down:

$$I(X) = -\log{\mathbb{P}(X)}$$

Information

Unless you're already familiar with information theory, that probably doesn't make sense yet. So let's do a quick breakdown:

$I(X)$ measures the information (more specifically, it's called the self-information) of an event $X$.
$log$ is well, just a regular logarithm function. There's an interesting piece of information (see what I did there) surrounding the choice of the base for the $log$, which I'll talk about a little later. But for simplicity, assume that it's base $e$, which would make $log$ represent the natural logarithm
$\mathbb{P}(X)$ is the probability of the event $X$ occurring.

So why is this a good definition for information?

Well, think about some of the properties of a good definition of information.

Ideally, events that are pretty normal and naturally occurring (high probability) should carry less "information" than rare events.

For example, the sun rises every day. What do learn from that? Not much.

But suppose you look up and see this:

The Moon blocking the Sun during the total total eclipse of August 21, 2017 from Wikimedia

That's a solar eclipse. It doesn't happen everyday (at least, not on earth). Just by noticing the eclipse, you'd know the relative positions of the sun, moon, and earth.

Visualisation of a solar eclipse from different positions. Each icon shows the view from the centre of its black spot, representing the moon. The magnitude values pertain to the middle icons from Wikimedia

In other words, a solar eclipse happening gives you more "information" than the sun rising.

Does this property of high-probability events having low information check out with Claude Shanon's definition? Take a look at the plot for $y = -\log{x}$:

So yes, $I(X) = -\log{\mathbb{P}(X)}$ looks good.

What Does The $\log$ Say?

If you're wondering what the base of the logarithm signifies, wonder no more.

When we're measuring information, we need some common unit to measure how much information there is. Say, for example we want to use bits.

Here's the cool part: since bits are the number of base-2 digits required, using base-2 for the logarithm gives you $I(X)$ in bits.

Yes, the very same bits that your computer uses to measure how much it has to work to load those YouTube videos that you couldn't stop watching last night.

To illustrate how amazing this idea is, I'm going to calculate the information produced by Shaq score a three pointer.

Shaq has scored exactly 1 three (Source) out of the 22 attempts in his career.

$$ I(\text{Shaq scoring a three}) = -\log_2{\frac{1}{22}} $$

$$ \approx 4.45943161878 $$

About 4 or 5 bits. Which, to be honest, is far lower than I thought it would be.

Entropy

There's a small weakness with our definition of information– it only accounts for single events.

But single sample (or events) don't always tell you the full story.

Take a look at these 2 distributions:

Let's call the random variable represented by the blue distribution $B$ and the random variable represented by the red distribution $R$.

Now let's say we sample from both distributions, and get a value of $-2$ for both $R$ and $B$.

Since we know the distribution of the random variables $B$ and $R$, we can calculate the information "released" by observing that they are both $-2$.

But there's a catch. According to the distribution, it's pretty likely that you'll get $-2$ if you sample $B$. In fact, it's the mean. When you look at the distribution of $R$, however, it's much lower.

So when we try to calculate information, we'll get that

$$ \mathbb{P}(R=-2) < \mathbb{P}(B=-2) $$

$$ \implies I(R=-2) > I(B=-2) $$

Ok, this kind of makes sense. It agrees with what we've established so far. But our conclusion about $R$ giving more information cannot extend to the entire distribution, only that one particular case where $B=-2$ and $R=-2$.

That sounds pretty useless then, right? Because claiming that sampling from $R$ would give you more information is completely ridiculous. $B$ and $R$ differ only by a constant. So they should theoretically have some quantity in common.

If you didn't follow, here's what I mean: Say you have 2 dice (because even though physicists have proved the convergence of Cauchy sequences in infinite dimensional Hilbert spaces and calculated the curvature of differentiable Riemann manifolds, the motion of a 6-faced regular solid is beyond the comprehension of physics and is therefore, 100%, unequivocally, random).

Both of them are fair, but the've been numbered differently. One of the die is numbered from 1-6, while the other is numbered from 3-8.

This sort of models the situation above. Both distributions are only separated by only by a constant, so they sampling from both distributions (rolling the dice) would, on average, produce the same amount of information.

If you didn't get the gist already, Claude Shanon was a genius, so he figured this out too. He defined a quantity which is common for both distributions– Entropy.

Again, this isn't some vague and abstract definition like those pop-science videos on YouTube that just can't seem to stop saying "Entropy is chaos".

No. Claude E. Shanon wrote the darn thing down:

$$ H = -\sum_i{\mathbb{P}(X=x_i) \log{\mathbb{P}(X=x_i)}} $$

Where $H$ represents entropy, and the random variable $X$ takes on the values $x_i$.

In the continuous case,

$$ H = -\int_{\mathbb{R}}{p(x) \log{p(x)}} $$

Where $p(x)$ is the probability mass function of a distribution.

Let's break that down. From now, I'll focus on the discrete case.

We know that the information produced by individual events sampled from two distributions messes the math up, so what else might work?

We could try to find the information released by the obtaining the mean (or median or mode) from a single random sample.

But again, the keyword there is single sample. So we still wouldn't get clues about the distribution as a whole.

What we'll use is the expected information. Which is exactly what it says it is– the expected value of information.

$$ \mathbb{E}[I(X)] $$

And we know how to calculate expected values!

$$ \mathbb{E}[I(X)] = \frac{\sum_i{I(X_i)}}{N}$$

So what's the total amount of information?

Let's see, event $x_i$ occurs with probability $\mathbb{P}(X = x_i)$. If we take $N$ samples, $x_i$ happens $N_i$ times. So,

$$ \mathbb{E}[I(X)] = \frac{\sum_i{I(x_i)}}{N}$$

$$ = \frac{\sum_i{N_i I(x_i)}}{N} $$

$$ = \frac{\sum_i{\mathbb{P}(X=x_i) N I(x_i)}}{N} $$

$$ = \sum_i{\mathbb{P}(X=x_i) I(x_i)} $$

$$ = -\sum_i{\mathbb{P}(X=x_i) \log{\mathbb{P}(X=x_i)}}$$

So I guess Claude Shannon, in a really meta way, didn't show that he knows a lot. He mathematically proved it.

Understanding Uncertainty

Pugazharasu Anancia Devaneyan — Sat, 13 Mar 2021 23:00:00 GMT

The Heisenberg Uncertainty Principle, although well known in the pop science genre, it is not understood mathematically by most.

This is highly due to the fact that the Quantum realm is thought of as the Pandora's box in physics. In this article we'll explain the uncertainty principle using just two postulates of Quantum Mechanics. The only prerequisites would be to be familiar with matrices and complex number algebra, as the rest is derived from there onward. Eigen-stuff and the Cauchy-Schwarz inequality will be used to introduce the topic. Disclaimer: This is only intended to be an introduction to Quantum Mechanics, and a conversation starter, as the topic is much more math heavy when studied in detail.

Quantum Mech 101

When we use the word "Postulate", we mean that it is a principle from the which the rest of the theory can be constructed. So here are the postulates for Quantum Mechanics that we will consider.

The State Vector

The state vector is an object represented as

$$| \psi \rangle$$

This is known as a "Ket" or column vector. We can extract maximal information (i.e. as much as we can. This is not necessarily everything) about the system by applying operations to it, thus there is by nature an unpredictability about the future of the system. This is in stark contrast (or maybe not) to classical physics, where knowing the state of something corresponds to knowing everything that can be known about it. The obvious question to ask is, "whether the unpredictability is due to an incompleteness in the a quantum state or is it due hidden variables that are inaccessible to us?". There are various opinions about this matter, this is still an open issue. However, for now we'll act as if there is an inherent unpredictability (despite newer theories having deterministic features) . This approach is called the "Copenhagen interpretation". Another interesting thing that we could do is express the state vector as a superposition of other states:

$$| \psi\rangle = \alpha |\psi_1 \rangle + \beta |\psi_2 \rangle$$

Provided the complex numbers $\alpha$ and $\beta$, satisfy the condition $$\alpha {\alpha}^{*} + \beta {\beta}^{*} = 1$$

They are said to be normalized, we impose this condition because the complex conjugation represents the probability of something occurring and we want to ensure that all the probabilities add up to 1. Where the $*$ symbol represents complex conjugation. Similarly we can introduce a row vector called the "Bra", but however, since the elements are complex numbers we need to conjugate them as well

$$ \langle\psi | = |\psi\rangle^{\dagger} = {(|\psi\rangle^{*})}^{T} = \begin{pmatrix}\psi^{*}_1 \ \psi^{*}_2 \end{pmatrix} $$

Where $T$ represents the transposition of a matrix and the $\dagger$ (dagger) represents complex conjugation followed by transpose. We can write it as,

$$\langle \psi | \psi \rangle = \begin{pmatrix} \psi^{*}_1 \ \psi^{*}_2 \end{pmatrix}\begin{pmatrix} \psi_1 \\ \psi_2 \end{pmatrix}$$

$$\langle \psi | \psi \rangle = \psi_1 \psi^{*}_1 + \psi_2\psi^{*}_2$$

This can still have negative numbers, so we take absolute value of it. However, this still doesn't represent probabilities since . Thus, we square it. This is analogous to finding the square of the length of the vector representing $\psi$,

$$|\langle \psi |\psi\rangle | ^{2} = 1$$

This is called the "Born rule", as it was first suggested by Max Born.

Observables

We can apply operations to the state vector to find out what happens to measurable quantities such as momentum and position. The operations are termed as operators and are represented using matrices

$$\hat{O} = \begin{pmatrix} O_{11} & O_{12}\\ O_{21} & O_{22} \end{pmatrix}$$

These operators are termed linear that is they follow the properties:

$$\hat{O} (\alpha |\psi\rangle) = \alpha \hat{O} |\psi\rangle$$

$\forall \ \alpha \in \mathbb{C}$. $\forall$ indicates "for all" and $\in$ as Read in continuity, they mean "for all values in a set $\mathbb{C}$, the set of all complex numbers".

$$(\hat{O}_1 + \hat{O}_2)|\psi\rangle = \hat{O}_1|\psi\rangle+ \hat{O}_2 |\psi\rangle \\ \hat{O} (\alpha |\psi\rangle) = \alpha \hat{O} |\psi\rangle$$

For example the momentum operator is given by the equation

$$\hat{P} = -i \hbar \frac{\partial}{\partial x}$$

where $i$ is the imaginary number, and $\hbar = \frac{h}{2 \pi}$ a physical constant ($h$ is called Planck's constant) with the dimensions . So if we look at how it operates it is easy to see that

$$\hat{P}|\psi\rangle = -i \hbar \frac{\partial | \psi \rangle}{\partial x}$$

Math interlude: Matrices

Matrix Multiplication

When we multiply two number A and B for instance, we have this property

$$AB = BA$$

this is called as commutativity. However, when deal with operators we're dealing with matrices, so in general

$$\hat{A}\hat{B} \neq \hat{B}\hat{A}$$

But in a few cases two operators can commute, thus we use a measure called the "Commutator" to quantify if or not two operators commute

$$[\hat{A}, \hat{B}] = \hat{A}\hat{B} - \hat{B}\hat{A}$$

Thus, if the commutator is 0 then the operators commute, if not then they don't commute. satisfied. Moreover we can represent the multiplication of a Bra and a Ket as,

$$\langle X|Y \rangle = \langle X | \ |Y\rangle = \begin{pmatrix} x_1 \ x_2 \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \end{pmatrix}$$

and it is quite easy to see that

$$\langle X | X \rangle = |X|^2$$

Eigen-stuff

Since operators are basically matrices, we can also have the "transformation" picture in mind. That is, when I multiply a vector or in this case the state vector with a matrix I make a change to it. This can be visualized as a passive transformation i.e. change of coordinates or an active transformation i.e. the state vector's position is transformed.

Most of the time the , but there are a few vectors such that their direction remains unchanged, they are called Eigenvectors. Their direction isn't changed however there is no constraint on how the length is changed, so sometimes their length is scaled up or down by multiplying it with a real number called an Eigenvalue. For a more intuitive explanation, check out this video by 3Blue1Brown

That is in summary, the effect the operator has on that particular vector is equal to multiplying it by a real number.

$$\hat{O} |\lambda\rangle = \lambda |\lambda\rangle$$

What does this mean for Quantum Mechanics? Each Eigenvalue represents something that can be measured by applying that operator. More importantly, this will always be a real number as we can only measure real numbers. For more deep dive into Eigen-stuff head to this article on Eigendecomposition.

Math interlude: Statistics

In Quantum Mechanics we can only make probabilistic predictions so we will deal with probability quite much. Through that we can define the average of a possible values $\lambda_i$ measured by an operator $\hat{A}$

$$\langle \hat{A} \rangle = \sum_{i} \lambda_i P(\lambda_i)$$

Where $P(\lambda_i)$ represents the probability of that particular to be measured. We can rewrite this in terms of a state vector $| \psi \rangle$

$$\langle\psi | \hat{A} |\psi \rangle= \sum_{i}^{} \lambda_i P(\lambda_i)$$

What we mean by uncertainty is simply how much a particular value or a variable in general deviates from the mean, this is captured using the statistical quantity "standard deviation". To define it first we define a new operator such that

$$\bar{A} = \hat{A} - \langle \hat{A}\rangle I$$

whose Eigenvalues are defined by

$$\bar{\alpha} = \alpha - \langle \hat{A} \rangle$$

We can now define the uncertainty or standard deviation of $\hat{A}$ which we call $\sigma^{2}_{A}$ as

$$\sigma^{2}{A} = \sum_{i}^{} \bar{\alpha}^{2} P(\alpha)$$

or as,

$$\sigma^{2}{A} = \sum{i} (\alpha_i - \langle \hat{A}^{2} \rangle)^{2}P(\alpha_i)$$

Now if we assume $\langle \hat{A} \rangle = 0$, that is to say that the distribution $\langle\hat{A} \rangle$ is symmetric, then

$$\sigma_{A}^{2} = \sum_{i}^{} \alpha_{i}^{2} P(\alpha_{i})$$

Which can be written as,

$$\sigma^{2}_{A} = \langle \psi | A^{2} |\psi \rangle = \langle A^2 \rangle$$

Math interlude: Cauchy-Shwarz inequality

For all triangles as depicted above,

$$|X| + |Y| \geq |Z|$$

Where |X| is the length of the vector $\vec{X}$. We can also write the last equation as:

$$|\vec{X}| + |\vec{Y}| \geq |\vec{X} + \vec{Y}|$$

Squaring this equation it becomes,

$$|\vec{X}|^2 + |\vec{Y}|^2 + 2|\vec{X}||\vec{Y}| \geq |\vec{X} + \vec{Y}|^2$$

Expanding the right side we get,

$$|\vec{X}|^2 + |\vec{Y}|^2 + 2|\vec{X}||\vec{Y}| \geq |\vec{X}|^2 + |\vec{Y}|^2 + 2(\vec{X}.\vec{Y})$$

Cancelling the terms we find,

$$|\vec{X}||\vec{Y}| \geq \vec{X}.\vec{Y}$$

This is called the Cauchy-Schwarz inequality. Writing this using the state vectors, $$|X| = \sqrt{\langle X| X \rangle}$$

$$|Y| = \sqrt{\langle Y| Y \rangle}$$

$$|X + Y| = \sqrt{(\langle X | + \langle Y |)(|X\rangle + |Y\rangle)}$$

we have by substituting into the inequality,

$$\sqrt{\langle X| X \rangle} + \sqrt{\langle Y | Y \rangle} \geq \sqrt{\langle X|Y \rangle + \langle Y|X \rangle}$$

squaring it and simplifying we find

$$2|X||Y| \geq |\langle X|Y \rangle + \langle Y|X \rangle |$$

This is the Cauchy-Schwarz inequality written in terms of state vectors.

The Uncertainity Principle

Suppose we have a ket $| \psi \rangle$ and two operators $\hat{A}$ and $\hat{B}$, we define their standard distribution as

$$\sigma^{2}_{A} = \langle f | f \rangle$$

$$\sigma^{2}_{B} = \langle g | g \rangle$$

Where,

$$| f \rangle = (\hat{A} - \langle A \rangle)| \psi \rangle$$

$$| g \rangle = (\hat{B} - \langle A \rangle)| \psi \rangle $$

We use the Cauchy-Shwarz inequality,

$$\sigma^{2}_{A} \sigma^{2}_{B} = \langle f | f \rangle \langle g | g \rangle \geq {|\langle f|g \rangle|}^{2}$$

And for any complex number $z$,

$${|z|}^{2} = {[Re(z)]}^{2} + {[Im(z)]}^{2} \geq {[Im(z)]}^{2}$$

We then set, $z = \langle f | g \rangle$

$$\sigma^{2}_{A} \sigma^{2}_{B} = {\left(\frac{1}{2i} [ \langle f | g \rangle - \langle g | f \rangle]\right)}^{2}$$

But

$$\langle f | g \rangle = \langle \psi | ( \hat{A} - \langle A \rangle ) (\hat{B} - \langle B \rangle) | \psi \rangle$$

$$\langle f | g \rangle = \langle \hat{A}\hat{B} \rangle - \langle \hat{B} \rangle \langle \hat{A} \rangle - \langle \hat{A} \rangle \langle \hat{B} \rangle + \langle \hat{A} \rangle \langle \hat{B} \rangle$$

$$\langle f | g \rangle = \langle \hat{A}\hat{B} \rangle - \langle \hat{A} \rangle \langle \hat{B} \rangle$$

Therefore,

$$\langle g | f \rangle = \langle \hat{B}\hat{A} \rangle - \langle \hat{A} \rangle \langle \hat{B} \rangle$$

We can then say that,

$$\langle f | g \rangle - \langle g | f \rangle = \langle \hat{A}\hat{B}\rangle - \langle \hat{B}\hat{A}\rangle= \langle \psi | \hat{A}\hat{B} - \hat{B}\hat{A} | \psi \rangle$$

Which is the same as,

$$\langle f | g \rangle - \langle g | f \rangle = \langle [\hat{A},\hat{B}] \rangle$$

Putting this all together we get,

$$\sigma^{2}_{A} \sigma^{2}_{B} \geq {\left(\frac{1}{2i}\langle [\hat{A},\hat{B}] \rangle\right)}^{2}$$

This is called the generalized uncertainty principle. This basically states that two variables that do not commute cannot be measured with precision simultaneously.

Talking about position and momentum

We know that observable properties can be represented using operators, here we'll

$$\hat{x} = x$$

$$\hat{P} = -i\hbar \frac{\partial}{\partial x}$$

So we now try to find the commutator of those operators now

$$[\hat{x}, \hat{p}] = \hat{x}\hat{p} - \hat{p}\hat{x}$$

$$[\hat{x}, \hat{p}] = -ix\hbar \frac{\partial}{\partial x} + i\hbar \frac{\partial}{\partial x}$$

Now let's apply this to state vector to obtain the expectation value

$$[\hat{x}, \hat{p}] |\psi\rangle = -ix\hbar \frac{\partial}{\partial x} |\psi\rangle + i\hbar \frac{\partial x|\psi\rangle}{\partial x}$$

$$[\hat{x}, \hat{p}] |\psi\rangle = -ix\hbar \frac{\partial}{\partial x} |\psi\rangle + ix\hbar \frac{\partial |\psi\rangle}{\partial x} + i\hbar$$

$$[\hat{x}, \hat{p}] |\psi\rangle = i\hbar$$

Substituting this into the generalized uncertainty principle,

$$\sigma_{x}\sigma_{p} \geq \frac{1}{2i} i\hbar$$

$$\sigma_{x}\sigma_{p} \geq \frac{\hbar}{2} $$

$$\sigma_{x}\sigma_{p} \geq \frac{h}{4 \pi}$$

Capping it all off

We can visualize all of this in the form of waves. The more precisely we measure its wavelength

the less precisely we measure its frequency.

Thus, the state vector can be thought of a wave whose frequency and wavelength somehow correspond to position and momentum or it is simply a wave in a space where the axes are position and momentum. The uncertainty principle thus isn't just a property of quantum Mechanics but is a property of waves in general.

Acknowledgements

We would like to thank Chinmaya Bhargava and Samarth Mishra for picking out errors in the draft.

References

Susskind, L. and Friedman, A. Quantum Mechanics: The Theoretical Minimum. Basic Books, 2014
Schwichtenberg, Jakob. No-Nonsense Quantum Mechanics: a Student-Friendly Introduction. No-Nonsense Books, 2019
Jakob, Demystifying Gauge Symmetry

My Frame, Your Frame

Shanmugha Balan — Sat, 06 Mar 2021 23:00:00 GMT

You're probably sitting and reading this piece on the planet we call Earth. But Earth is just an insignificant rock in the vast Solar System we live in, and we have barely sent a couple of tiny metal boxes to the edge of it. This stellar system is again just a tiny system in the humongous Milky Way galaxy. To go a few more orders of magnitude higher while this in itself is quite incomprehensible to our logarithmic minds, we end up as a speck in the Laniakea Supercluster, the home to around a hundred thousand galaxies. How do we pinpoint ourselves in this mess?

Let's say you have an alien friend, who wants the directions to our place. Let's simplify the problem to identifying ourselves in the Milky Way. If our alien shares the same naming conventions as us, we seem good. But it's an awful assumption that they have the same names for every rock, gas ball, and gravity well out there. We stick to something more uniform for us all, something like math. We try to describe our position mathematically and to do so, we will need a coordinate system.

Rectangular Coordinate System

To explore the idea of a coordinate system, let's return to more friendly firm ground on Earth. This is a map of Manhattan, New York, USA.

Source: Google Maps

There are nice rectangular streets in the place, so it would be quite easy to locate a person from a bird's eye view. To mark off specific points, we'd first need to use a reference point. Let's consider Times Square, being such a nice central point as our reference. We could have chosen something else too, say for example the Empire State Building, but let's go with Times Square. The road north towards Carnegie hall will serve as one of the axes for our purpose. The other axis will be the road perpendicular to this one, but again passing through Times Square. This leads us to something familiar, the Cartesian coordinate system or the rectangular coordinate system. Only it's rotated a bit. That feels slightly off. Let's consider every road intersection as a point on the coordinate grid in our system that we just defined. This should put St Patrick's Cathedral at $(2.5, 5.5)$. Now, let's rotate the coordinate system so our y-axis faces north in a setting we are more used to (from Times Square along a line slightly to the left of Time Warner Center), but let's keep Times Square as our center, in the origin. Does this change the position of St Patrick's Cathedral? No, only the coordinates change, and with a piece of paper, we can calculate that. The x coordinate slightly increases while the y coordinate reduces to a rather small value. What if we redefine our origin now, and we shift it to the Empire State Building? Again, the position of St Patrick's Cathedral couldn't care less. The only thing we'd have to change is the coordinate value itself. And we can convert between the various systems we define quite easily.

Polar Coordinate System

What if we radically change our view of a coordinate system? Let's take a look at Connaught Place, New Delhi, India.

Source: Google Maps

We can try to attack this problem with a similar approach to the last. We could be anarchists and choose Madame Tussauds as our origin, but let's just pick the center of the Central Park and not be masochistic. Let's say we fancy a nice burger, and want to pinpoint the location of Burger Singh. Assuming the direction of true north as the y-axis and the perpendicular direction as the x-axis, we can try something, but that would have a lot of ugly specific numbers. Can we make our problem simpler? The circles suggest a radial approach to our problem, and that is what we will try. Considering the east as a reference axis, if we rotate a ray around $135^o$ counterclockwise from the axis, we should land on a line with the eatery. We now determine it's on the second circular road from the center, and we have just defined a new coordinate system that works. This is called the polar coordinate system, where instead of representing coordinates as $(x, y)$, we use $(r, \theta)$ to represent our coordinates. The former is a radial coordinate and the latter is an angular coordinate. And as we saw in the Manhattan example, we can switch between the rectangular system and the polar system just as we switched between the systems there.

Source: Wikimedia Commons

Returning to our original problem, let's now take a spaceship eye's view of our galaxy. The circular patterns again suggest the polar coordinate system to us. For our origin, we could again choose the center of the galaxy. The reference axis could be the ray from the center to the top of the image. With this, we can say our Solar System should lie in the whereabouts of the point $(\text{Orion-Cygnus Arm}, -45^o)$ give or take a few hundred light-years. But where will we be in a hundred thousand years? The solar system, much like the Earth around the Sun, takes a trip around the center of the galaxy in around 225-250 million years. This poses a small problem for us, we'd have to keep changing our measurement. To keep our coordinate constant, we can instead keep the reference frame moving. Let's change the reference axis to something that's revolving around the center as well, to the ray from the center passing through the tip of the Perseus arm. And this moving reference frame can be used interchangeably with our previous frame making a major assumption - the rotation around the galactic center is uniform.

Inertial Reference Frame

This leads us to our next key idea. Our reference frame can be absolutely anything, provided our coordinates are defined properly. When there is a nice constant velocity, we can simply jump between coordinate systems very easily. This is called a Galilean transformation. This sort of reference frame where everything has a difference of only constant relative motion is called an inertial reference frame. The laws of physics are the exact same in every inertial reference frame. To experience a non-inertial reference frame, just get into an elevator. The jerk the elevator gives you while accelerating, for those fleeting moments, puts you in a non-inertial reference frame. You would experience unfriendly unseen forces, by virtue of the acceleration of your frame. If you step out of the elevator and look at it from the ground, in an inertial reference frame, you would be able to point the different forces out but locked in the elevator, there's no way to really know. Without such complications, the Galilean transformation still suffers a major flaw, which was exposed by the Theory of Special Relativity, and corrected by the Lorentz transformation.

Why is the Galilean transformation incorrect?

Let's consider two reference frames, a stationary one, A, and one moving at a constant velocity in one direction, B. Let's assume a coordinate system for A as $(x, y, z, t)$. This four-vector is used to refer to spacetime or is also known as the Minkowski space. Assuming frame B moves with a velocity of $v$, an observer on frame A would see the observer on frame B move as $x=vt$. Assuming $(x', y', z', t')$ as the corresponding coordinate system for frame B, the conversion from A to B would be straightforward as described by these equations.

$x' = x-vt$

$y' = y$

$z' = z$

$t' = t$

Now let's say there is a light ray from the origin towards B's frame. In the frame of A, it would appear to be moving as $x = ct$. How would it look in the frame of B? Let's use $t'$ to account for time in B's frame. This would mean the ray would appear to move as $x' = ct'-vt'$. So the speed of light would appear to be $c-v$. But with the advent of modern technology and precise measurement techniques, it turns out the speed of light in the frame of B is not $c-v$, but rather just $c$. And we're just getting started.

The Flaw

The Galilean transformation equations were much like Archimedean laws of motion, they were correct with what was observed at the time. With the speed of light being constant in all frames being set in stone both by experiment, and Einstein's theory, we have to now implement a workaround. To do so, we must first capture the flaw. The elusive flaw is the fourth equation in the Galilean transformation. Special relativity changes the very notion of simultaneity and makes it frame-dependent. This idea has also been experimentally verified with atomic clocks and planes, atomic clocks and mountains, to test if the motion of the reference frame actually has an influence on the ticking of time.

Lorentz Transformation

If the Galilean transformation is wrong, how do we jump from frame to frame? We start with an assumption that space-time is the same everywhere, an infinite, unchanging coordinate space. This says that no event is special, and we can choose our origin arbitrarily. We already played around with this idea in Manhattan, let's apply it to the Universe. The Galilean equation does get a few things right, for instance, that $x=vt$ when $x'=0$. We have to modify this to account for speeds, but not lose the linear property of this equation. We'll also make another helpful assumption, let's consider the speed of light, $c$, a constant value as a dimensionless quantity, 1. So now, let's multiply the right side with a function of the velocity.

$x' = (x-vt)f(v)$

But what is $f(v)$? We have another ace up our sleeve. Our function should not be able to discriminate between left and right. This means that we must lock the sign of the velocity in the function, and a simple way to do this is to square the velocity. So now, we have this equation.

$x' = (x-vt)f(v^2)$

In the same way, we can modify our equation to account for the time axis.

$t' = (t-vx)g(v^2)$

Now if we consider the path of a light ray in both the frames, and apply our principle, that the speed of light is constant in all the frames, then we find that $x=t$ and $x'=t'$. From this, we get another beautiful piece of symmetry:

$f(v^2)=g(v^2)$

We have one last piece of symmetry which solves our puzzle. And this is that we can't really say which frame is moving, is the frame B moving with velocity $v$ or is the frame A moving with velocity $-v$? So this means that instead of having $(x, y, z, t)$ for frame A, we could have had $(x', y', z', t')$.

Now we can just cross substitute values to give us these twin equations.

$x=(x'+vt')f(v^2)$

$t = (t'+vx')f(v^2)$

With some algebra, we can solve for $f(v^2)$.

$f(v^2) = \frac{1}{\sqrt{1-v^2}}$

Finally, we get in the prized Lorentz transformation, the heir to the throne.

$x'=\frac{x-vt}{\sqrt{1-v^2}}$

$t' = \frac{t-vx}{\sqrt{1-v^2}}$

To go back to our old unit system with which we've gotten so used to, all we need to do is ensure our equations are dimensionally consistent, and having taken $c$ as 1, we need to make these changes:

$x' = \frac{x-vt}{\sqrt{1-\frac{v^2}{c^2}}}$

$t' = \frac{t-\frac{v}{c^2}x}{\sqrt{1-\frac{v^2}{c^2}}}$

For small friendly values of $v$, these equations can be approximated back to the Galilean equations. But for large values, when $v$ is comparable with $c$, these changes become quite drastic. Now we can try to be sneaky and plug in super large values of $c$. If $v$ happens to be greater than c, we end up with imaginary values which is crazy. So we put forward another rule. No object can move faster than light.

The cool idea here is that we don't have to stretch out too many axes and get lost in a muddle of algebra. If we want to jump from one axis to another, first we start by rotating the axes, and align them. Now a simple Lorentz transformation along the new x axis to get us up to speed, and we can just unrotate the rotation to restore the original orientation.

We've just taken our first step into a larger, paradoxical, fun world.

Beyond Chaos

Shanmugha Balan — Sat, 27 Feb 2021 23:00:00 GMT

Entropy is a profound concept. The second law of thermodynamics may have given entropy the job of being the arrow of time, but it goes way further. Entropy is an important concept in information theory, genetics, economics, and more.

For example, in genetic and DNA sequencing, Shannon entropy became a standard measure. In economics, entropy enters again as many “thermo-economists” have combined these fields, including the concept of information theory and statistical physics. Chemical processes, either reversible or irreversible, are based the entropy of the macrostates into account. Similarly, these economists conjecture that the direction of economic change may involve the entropy of neighboring microstates.

Claude E. Shannon, the American Mathematician, was one of the first to take entropy out of its limited domain and into a very new field — informatics.

Forces and Energy

Most of the forces in our universe are connected, in some way, with energy.

If we drop a ball from a certain height, its gravitational potential energy causes it to fall down. If we hold opposite ends of magnets together, the poles are attracted due to the energy of the magnetic field. The poles were presented with a state having lower energy than the current state they existed in, so they moved in that way, exerting a force.

As a general rule in physics, objects tend to move from a state of higher energy to a state of a lower energy.

To go from the higher state to the lower one, they will exert a force. Thus, conservative forces are given by the change in potential as:

$$F = -\frac{dU}{dx}$$

But is it possible to have a force without an energy change?

The answer is yes.

Entropy and Energy

Let’s return entropy back to where it all started. The arrow of time. A property so close and so similar to energy, it might even substitute energy. Consider the case of the gas in the piston as shown below.

The gas within the piston exerts a force on the walls of the container. As the walls of the container are rigid, they don’t move. The piston however is pushed outwards by the gas. The internal energy does not depend on the volume, and the temperature of the gas has not changed. So where did the force needed to create this expansion come from? The answer is not energy, and is entropy.

Thermal fluctuations in a thermodynamic system bring the system towards the macroscopic state with the maximum possible entropy. Therefore, the entropic force is said to be a force that emerges as a result of a system’s statistical tendency to increase the entropy, rather than any forces at the atomic-scale.

But what exactly is an emergent phenomenon?

Consider a container with air inside it. The mass of the system can be measured as a whole. Yet the mass comes from the individual mass of the particles of the various gases which compose the air.

But now consider, for example, me writing this article. I need to think of the words and type them, while I also need a medium in which to put them. The process of me writing, which involves both the computer and me, is an emergent phenomenon, some property or process which can be done only by the system as a whole.

Another example is the saltiness of salt. The salty taste doesn’t come from the Na+ ions or the Cl- ions, it comes from them both, it is an emergent property of the system.

The elastic force of polymers also has entropic origin. When a polymer is stretched wide, the number of states the system can be in is reduced, since the string of molecules and bonds is pulled taut.

When the polymer is back in its normal state (relaxed and slack) it can be in many more states, which means higher entropy. And since the molecules in the polymer would prefer being in the higher entropy state, they naturally bounce back to their original shape.

But the strangest way an entropic force manifests itself is in the form of one of the fundamental forces of nature — gravity.

The gravitational force was described entropically in a model by Erik Verlinde in 2009. The thermodynamic description of gravity has a history that goes back all the way back to 1973 when Jacob Bekenstein and Stephen Hawking tried to explain the entropy needed by a black hole to comply with the laws of thermodynamics. This was taken further in 1995 when Theodore Jacobson showed that Einstein’s field equations describing relativistic gravitation can be combined with thermodynamics with the help of the equivalence principle - the equivalence of gravitational and inertial mass.

But what in the world is the difference between them in the first place for an equivalence principle to prove their equivalence? The inertial mass is very simple, it is the mass we’re so used to in most of our calculations - it is the measure of how fast an object accelerates given the same force. Gravitational mass, however, is a product of Newton’s Universal Law. It paved the way for the discovery of atoms in a way - that we’re all but stardust, made of the same atoms, the very same mass everywhere. The very same mass which causes a gravitational field. The equivalence principle states that they are equivalent (Euclid’s fourth postulate dabs left and right multiple times). Another example is as follows. You’ve all probably heard this analogy sixty nonillion times before, but anyway, consider you are in an elevator. It’s blocked from the outside world (of course) and it’s freely falling. It is impossible to tell whether you are just standing in the middle of the Boötes void (standing in a virtually gravitation-free space) or trying to make another happy landing on Coruscant (freely falling).

Back to the main point, black holes, a gravitational beast was being tamed by thermodynamic equations, entropically. This sparked the idea that gravity and entropy are related. While the equivalence principle was the tool to pick out the black hole entropy with the second law there, here the holographic principle is the tool to dissect gravity entropically.

Now, what is this new principle?

The holographic principle itself was inspired by black hole thermodynamics. It is a fundamental pillar of string theories and is a supposed property of quantum gravity. It was first proposed by Gerard ‘t Hooft. Leonard Susskind gave it a string theory interpretation by combining ‘t Hooft’s ideas with his own. It states that the description of a volume of space can be thought of as entropically encoded as information on a lower dimensional boundary to the region. Something like the 4D volume of a surface encoded on its 3D volume, loosely speaking, or encoding the 3D volume on its 2D boundary, its surface area.

Let’s go along a tangent. The equipartition theorem is a pretty important part of classical, statistical thermodynamics. Systems have multiple states they can be in and the set of all states constitutes its phase space. The degrees of freedom of the system are the dimensions of the phase space so generated. The equipartition theorem states that the energy of the system is equally divided between its degrees of freedom.

Let’s go along a normal. The Unruh effect is a weird prediction of the quantum field theory. From the point of an accelerating observer, empty space (vacuum) emits black body radiation at a temperature proportional to the acceleration. To put it very simply, if you measure the temperature in a proper vacuum while in an accelerating frame, the thermometer would show a non-zero reading. The Unruh temperature also has the same form as the Hawking temperature of a black hole. Hence, it is also called as the Hawking-Unruh temperature. This effect may seem rather counter-intuitive. But bear in mind that vacuum, in modern terms, is not defined as space devoid of matter, but rather as the lowest possible energy of the quantized fields that make up the universe.

Now let’s put it all together.

Let’s start from the part we got from the holographic principle. Let us encode $N$ bits of information into an area, $A$. The Planck Length is given by $l_P$

$$N = \frac{A}{{l_P}^2}$$

Also, from the equipartition theorem, we get,

$$E = \frac{1}{2}{k_B}NT$$

From the Unruh effect,

$$T = \frac{\hbar a}{2 \pi c k_B}$$

Now, as we are doing this derivation classically, we take Newton’s Second Law,

$$F = ma$$

Putting in a rearranged form of the Unruh equation,

$$F = m \times \frac{2 \pi c k_B T}{\hbar}$$

Substituting the equipartition theorem,

$$F = \frac{4 \pi c E}{N \hbar}$$

From the mass-energy equivalence principle,

$$F = \frac{4 \pi c^3 M}{\hbar N}$$

Putting in the encoded information and substituting the standard formula for Planck length,

$$F = \frac{4 \pi GMm}{A}$$

If we take the holographic screen as a sphere of radius of $r$, and put it’s surface area,

$$F = \frac{GMm}{r^2}$$

Hey, now doesn't that look familiar!

But alas, for beauty can only take you so far. We must also test for accuracy as much as beauty. Though entropic gravity has been able to, in its current form, reproduce the Einstein field equations, it would have to be tested at the Lagrange points - the points where gravitational forces between two bodies are balanced. The theory has also been challenged on other formal grounds. Nevertheless, it still shows how physics can truly be beautiful.

I'm Radioactive, Radioactive

Shanmugha Balan — Sun, 07 Feb 2021 23:00:00 GMT

Here's a quick question: Where was the first nuclear fission plant on Earth located? Who built it and oversaw its working?

Pfft. Is this is a fifth-grade pop quiz? We all know that one. It was made by Enrico Fermi and his associates at the University of Chicago.

Wrong.

OK, then was it probably the warring Germans during the World War or someone from the Soviet Union, right?

Wrong again.

If you thought nature only knew about lasers, you thought wrong. The first-ever nuclear reactor lay in Gabon, a tiny country in Africa. Moreover, it wasn't manufactured by a despotic megalomaniac or a deranged physicist that tried to murder his professor with a poisoned apple.

No, it was a creation of Mother Nature.

But we're getting ahead of ourselves. To understand this story in its entirety, we need to take a step back — to May 1972.

Francis Perrin, a French nuclear physicist, sat in his room, scratching his head about the fresh news that the radioactive uranium core at Oklo has a lower than usual concentration. The accepted convention for the concentration is 0.72%, but the concentration at Oklo was 0.717%. You may think that this is a tiny amount but bear in mind from the mass-energy equivalence that this could mean a lot of energy. Like, a real lot.

After hours of work, Perrin decided that there could be only one explanation — Around 1.7 billion years ago, a self-sustaining nuclear reaction occurred.

I'm waking up to ash and dust...

But how did he come to the conclusion that giant explosions (well technically "releases of energy") were a thing before humans started messing around with science? Well, he had evidence.

At the site of the supposed nuclear reaction, fissile uranium-235 (the kind that makes nuclear reactions work) was present in 3% of the natural ore, which is considered to be abundant. Most of the current nuclear reactors in the world also enrich their fuel to roughly this ratio (given certain financial constraints). So the uranium at the site definitely was enough to reach critical mass at the time.

But does the presence of a fissile isotope guarantee a nuclear reaction? No, we need more concrete evidence.

I wipe my brow and I sweat my rust...

The next factor for a controlled nuclear reaction is the moderator. Without a moderator to calm down the super energetic neutrons spewed out of the reaction, it would not be a controlled fission reaction. In fact, no reaction would happen in the first place.

Fortunately, Oklo is an open-pit mine that's inundated with water, which would have served the job just as well as the heavy water used in nuclear reactors today.

However, the fission reaction was not continuous, as the water in the pit would boil off and stop the reaction. When it cools and condenses, the reaction restarted. This cyclic process took around 3 hours.

I'm breathing in the chemicals...

But wait a minute, how do we know how all this happened (In about the same time it takes to watch a Hollywood blockbuster, if I may add)?

One way to prove this is to look for products and their signatures. Here's what we (as in, us humans) found:

Xenon, a fission product of this nuclear reaction, has five isotopes as products. All five isotopes were found in the rock formations of the pit mine.
Neodymium, a lanthanide, is one of the products of nuclear fission of uranium-235. A variety of its isotopes were found in the region, and their percentage composition at Oklo was determined.

The researchers then did some math-ing around and observed that the graph of fission product signatures at Oklo was almost an exact match with existing charts from a nuclear reactor.

One thing to note, though, is that corrections were made to account for the natural occurrence of neodymium at Oklo. Notice the complete absence of Nd-142 and comparably higher than the natural composition of Nd-143.

I feel it in my bones, enough to make my systems blow...

The uranium deposit in the reactor was by no means small. A whopping five tons of uranium underwent fission at this location. Yeah, that's a big boom, I know.

The uranium present in crevices in the mine varied in size from mere centimeters to around five meters. The fission of just a single atom of uranium-235 provides approximately 200 million electron volts of energy.

If you do the math, you'll find that the energy produced by the Oklo reactor is a staggering $8.2 \times 10^{16} J$. That's more than the amount of energy produced by a million Little Boys, the atomic bomb dropped on Hiroshima.

I raise my flags, don my clothes, it's a revolution I suppose...

But what does this mean to you? Why should you care if some ancient pit mine in a random corner of the world had a nuclear reaction long before the first humans popped up and started beating up each other with stone clubs?

The answer lies in the hidden secrets of the Oklo reactor. Despite not having the safety measures of today's reactors, the Oklo reactor never underwent a meltdown or an explosion.

The open-pit mine was narrow, yet locked nearly all products in its veins. The products were not scattered across the region and remained in a relatively confined space. Though there was nothing then to feel its effects if it did explode, it still provides a good lesson in safety to today's reactors. The water regulated the reaction extremely efficiently and ensured that the plant didn't go supercritical.

We'll paint it red to fit right in...

Now, how did our Oklo reactor handle the nasty stuff that comes with the word "nuclear"?

Cesium, a dangerous by-product of the nuclear reaction, was found to be captured by Ruthenium.
Other by-products of the reaction include radioactive gases krypton and xenon. These were found to be absorbed by aluminum phosphate minerals. The grains of these minerals can contain these gases for billions of years.

All systems go, sun hasn't died...

The KBS-3 method by SKB, a Swedish Nuclear Waste Management company, takes inspiration from the mine at Oklo. The spent fuel will be locked into copper canisters with iron inserts, which will be placed within Bentonite clay, an absorbent aluminum phyllosilicate clay similar to the aluminates found at Oklo.

The clay will prevent the fuel from contaminating water or seeping into the rock. It will also avoid damage to the copper canister from corrosion. This setup is placed deep into the Swedish bedrock. The SKB estimated that it will remain isolated from the rest of the world for around 100,000 years.

Deep in my bones, straight from inside...

For all physics folks reading this who think that the Oklo reactor has nothing in store for me, here's a fun little fact: the Oklo reactor may be relevant to our understanding of the atomic fine structure constant, $\alpha$.

$\alpha$ characterizes the strength of electromagnetic interaction between elementary charged particles. There was a hypothesis that this value might have changed over time. This is because $\alpha$ influences the rate of various nuclear reactions.

Here's where things get interesting: Sm-149 captures a neutron to become Sm-150. The rate of neutron capture of nuclear processes depends on the value of $\alpha$. If the ratio of the two samarium isotopes in Oklo is found, it can be used to calculate the value of the fine structure constant 2 billion years ago.

By analyzing the relative concentrations of the isotopes, researchers found that the nuclear reactions of the past were almost the same they are today. This further implies that $\alpha$ hasn't changed either. Interesting indeed.

Whoa, oh, oh, oh, I'm radioactive, radioactive…

This blast from the past is a fascinating insight about the nature (pun intended) of nuclear reactions.

A lot of valuable research has been slowed down by a general misconception of radioactivity. A lot of people think that it is an unstoppable force of evil and must be totally shunned. The public, in general, is not clear about nuclear.

MRI scanners, for example, were initially called NMRI (short for Nuclear Magnetic Resonance Imaging). The N was dropped to prevent the public phobia from going out of hand. While some nuclear programs are indeed questionable, this is not the case everywhere. Regardless, the progress of science has been slowed down by exaggerating the fear of the negative effects.

But this topic has already been explored extensively, so we'll leave it to the experts to handle the rest.

While tampering with atoms is dangerous, if we know how to do it properly and responsibly, like the Oklo reactor, we'll be just fine, man.

Disentangling Entanglement

Pugazharasu Anancia Devaneyan — Sat, 23 Jan 2021 23:00:00 GMT

Quantum mechanics is one of the recent topics in modern physics. Since its establishment as a field in physics, it has revolutionized our understanding and interpretation of nature and how the particles behave in scales smaller than an atom. This theory was developed in the early 20th century in order to explain the various quantum scale phenomenon that was observed by various pioneer physicists of the time like JJ Thomson, Albert Einstein, Niels Bohr, Max Planck, etc. Ever since then, physicists have continued to further understand this mysterious realm and the deeper we went the darker our understanding became where the existing laws of physics just didn’t make sense. In this article, we’ll try to make sense of the concepts of “Spin” and “Quantum Entanglement”. The specialty of these concepts is that there no classical analogies for them, and they are only applicable to the quantum mechanical systems.

Math!

Without further ado, let's set up the mathematics behind entanglement first! To start off, let's look at some of the "fancy" math symbols we commonly use:

Symbol	Meaning
$\forall$	For all
$\exists$	There exists a
$\in$	In
$\|$ or :	Such that
:=	Defined as
$\rightarrow$	Maps to

we will often use them to succinctly and precisely represent the ideas. We will simply review the underlying mathematics and not prove things. However, if you wish to see the proofs, you can consult any of the references.

Vector Spaces

A linear vector space or simply a vector space $\mathbb{V}$ is a set along with the multiplication $(.)$ and addition $(+)$ operations defined over a field $F$ (set of all complex numbers $\mathbb{C}$), such that the following axioms hold:

Commutativity: $| U \rangle + | V \rangle = | V \rangle + | U \rangle$
Associativity: $(| U \rangle + | V \rangle) + | W \rangle = | V \rangle + (| U \rangle + | W \rangle)$
Additive Identity: $\exists \ | 0 \rangle \in \mathbb{V} \ | \ | V \rangle + | 0 \rangle = | 0 \rangle + | V \rangle = | V \rangle$
Additive Inverse: $\forall \ | V \rangle \ \exists \ | V^{-1} \rangle \ | \ | V \rangle + | V^{-1} \rangle = 0$
Multiplicative identity: $\exists \ \alpha \in \mathbb{C} \ | \ \alpha . | V \rangle = | V \rangle$
Distributive properties:
$(\alpha + \beta) | U \rangle = \alpha | U \rangle + \beta | U \rangle$
$(\alpha \beta) | V \rangle = \alpha (\beta | V \rangle)$

Here, $\alpha , \beta \in F$ and $| U \rangle,| V \rangle,| W \rangle \in \mathbb{V}$. For more, check out

Dual Space

Every vector space $\mathbb{V}$ has a dual space $\mathbb{V}^{*}$ whose elements map elements of $\mathbb{V}$ to $\mathbb{R}$ i.e.

$$\forall \ |V\rangle \ \exists \ \langle W| \in \ \mathbb{V}^{*}: \langle W|:= |V\rangle \rightarrow F$$

Now, given such a map $\phi\in V^*$ and a vector $|v\rangle$, it is always possible to 'evaluate' the vector via $\phi(|v\rangle)$. This is natural in the sense that no other structure is needed to define this (contrast this with the isomorphism $V\to V^*$ which requires construction of a basis of $V$, which is indeed an extra structure). This evualutaion is defined more formally as a duality.

A duality is a triple $(X,Y,f)$ over the field F consisting of two vector spaces $X$ and $Y$ over $F$ and a bilinear map $f:X\times Y\to K$ such that $\forall X\ni x\ne 0,\ Y\ni y\mapsto f(x,y)\ne 0$ and vice versa. So, the triple $(V,V^*,f)$ is indeed a duality where $f$ is the evaluation map, that is $f(|v\rangle,\langle w|)=\langle w||v\rangle.$

Why is this important? Well because once we say that our vector spaces are indeed Hilbert spaces, one can define $\langle w||v\rangle:=\langle w|v\rangle$ as the natural inner product on the space! This along with completion (to be defined later) would ultimately say that for any $\phi\in V^*$, there exists a unique vector $|v_\phi\rangle$ such that $\phi(x)=\langle v_\phi|x\rangle$ for all $x\in V$. This is the famous Riesz representation theorem. This then automatically gives us a semi-bilinear (linear in first argument, anti-linear in second) map from $V$ to $V^*$. The map is $$\phi:V\to V^*\ \ \text{defined by}\ \ y\to \phi_y=\langle\cdot|y\rangle$$ Riesz gurantees this map is bijective and hence Dirac's famous result that every vector $|v\rangle$ has a dual $\langle v|$.

The Inner Product

Inner product is a generalization of the dot product that we are familiar with. It is defined as follows, the operation $\langle . | . \rangle$

Skew-symmetry: $\langle V | W \rangle = { \langle W | V \rangle}^{*}$
Positive semidefiniteness: $\langle V | V \rangle \geq 0$, unless and until $| V \rangle = | 0 \rangle$
Linearity for the vectors: $\langle U | (\alpha | V \rangle+\beta | W \rangle) = \langle U | \alpha | V \rangle + \langle U | \beta | W \rangle = \alpha \langle U | V \rangle+ \beta \langle V | W \rangle$

Where, $\alpha \in \mathbb{C}$ and $| U \rangle,| V \rangle,| W \rangle \in \mathbb{V}$ and $\langle U| ,\langle V |, \langle W| \in \mathbb{V}^{*}$

Linear Maps

A linear map/transformation is simply transformation $\hat{O}$ that

Adds inputs or outputs, $\hat{O}(\vec{V} + \vec{W}) = \hat{O}(\vec{V}) + \hat{O}(\vec{W})$
Scale the inputs or outputs, $\hat{O}(\alpha \vec{V}) = \alpha \hat{O}(\vec{V})$

Here, $\alpha , \beta \in \mathbb{C}$ and $| U \rangle,| V \rangle, | W \rangle \in \mathbb{V}$

Cartesian Product

Here the symbol "$\times$" means "Cartesian Product" i.e. it's action w.r.t two sets is the set of all ordered pairs $(a, b)$ where $a \in \mathbb{A}$ and $b \in \mathbb{B}$

Tensor Product

The tensor product $v \otimes w \ \forall \ v \in \mathbb{V}, w \in \mathbb{W}$ is an element of the set $\mathbb{V}^{*} \times \mathbb{W}^{*}$ i.e. the set of all Bilinear functions on which act on the pair $(h,g) \in \mathbb{V} \times \mathbb{W}$.

Hilbert Spaces

A Hilbert space is a vector space $\mathcal{H}$ that

has an norm i.e. length defined as $||V|| = \sqrt{\langle V | V\rangle}$
is complete i.e. all of it's Cauchy sequences converge

A Cauchy sequence is simply a sequence of numbers whose succeeding term is smaller than the preceding one and when taken to a limit it converges. Let's take a sequence $a_{1},a_{2}... a_{n}$. A function $d(a_{m},a_{n})$ tells us the distance between the terms $a_{m}$ and $a_{n}$, we can call it a Cauchy sequence if

$$\lim_{min(m,n) \rightarrow \infty} d(a_{m},a_{n}) = 0$$

One can visualize this as the plot of a dampened oscillator.

File:Cauchy sequence illustration.svg. (2020, October 10). Wikimedia Commons, the free media repository.

Quantum Mechanics 101

Now that the general math concepts (which we will be heavily relying on in the upcoming sections) have been established, we can now try to dive deeper into the technical aspects of a quantum mechanical system using the aforementioned concepts.

The State Vector

In Quantum Mechanics, we start with an object called the state vector $| \psi \rangle \in \mathcal{H}$

All the information about the system is contained in it
The position basis representation of the state vector is called the wavefunction $\psi (\vec{x}, t) = \langle x | \psi \rangle$

Observables

In Quantum Mechanics, observable quantities such as position and momentum are promoted to Linear maps i.e. observables. Acting with an observable on a state vector will allow you to measure the physical quantity that corresponds to the map. These maps are Hermitian/Self-Adjoint i.e.

$$\hat{O} = \hat{O}^{\dagger} = {(\hat{O}^{*})}^{T}$$

Where $*$ refers to complex conjugation and the $T$ to the transpose operation i.e swapping the rows and columns. The observable values of a particular quantity come in are the eigenvalues from the eigenvalue equation (here the operation of the observable simply scales or changes the length of the state vector) of the map acting on a state vector i.e. and such state vectors that obey this equation are called eigenvectors or eigenstates

$$\hat{O} | \psi \rangle = \lambda | \psi \rangle$$

Few Corollaries:

The eigenvalues are always real numbers

The eigenstates form an orthonormal (perpendicular and normalized i.e. of length 1) basis set

$$ \langle \lambda_{i} | \lambda_{j} \rangle = \delta_{ij} \begin{cases} 1, & \text{if } i=j \ 0, & \text{if } i \neq j \end{cases} $$

However, the catch is that we can't definitively say that we will measure a particular eigenvalue since quantum mechanics (at least as we understand it) is probabilistic. So we rely on Born's rule to compute probabilities for a given eigenstate

$$ P(\lambda) = {|\langle \lambda | \psi \rangle|}^{2} $$

Where $P(\lambda)$ is the probability of measuring the eigenvalue $\lambda$. Let's say we have a state vector given of the form,

$$| \psi \rangle = \frac{1}{\sqrt{2}} | u \rangle + \frac{1}{\sqrt{2}} | d \rangle$$

Both $| u \rangle$ and $| d \rangle$ have a 50-50 probability of being measured (think of Born's rule here!). However, let's suppose we measure and discover that the eigenstate $| u \rangle$ is what we've measured his process is inherently random and completely different from how quantum states would evolve when they are left unmeasured. Moreover, this erases any information we may of had about $| d \rangle$, our state vector $| \psi \rangle$ has essential been forwarded to $| u \rangle$

$$| \psi \rangle \rightarrow | u \rangle$$

This is the so-called measurement problem. Why do the processes have to be different? There are many conjectures that have been put forth to solve this, one notable conjecture comes from Sir. Roger Penrose, who suggested that gravity causes this non-linear process. This is yet another open problem so let's not dive into that rabbit hole yet!

Expectation Values

As a workaround we can talk about the average of all possible eigenvalues $\lambda_i$ to a given eigenstate by an operator $\hat{A}$

$$\langle \hat{A} \rangle = \sum_{i}^{} \lambda_i P(\lambda_i)$$

Where $P(\lambda_i)$ represents the probability of that particular to be measured. The expectation We can rewrite this in terms of a state vector $| \psi \rangle$

$$\langle\psi | \hat{A} |\psi \rangle= \sum_{i}^{} \lambda_i P(\lambda_i) = \langle \hat{A} \rangle$$

Spin

The initial idea of “spin” was given by Wolfgang Pauli which he first defined it as a “non-classical two valued quantity”. He introduced the concept of spin as an extra degree of freedom present quantum mechanical particles from the emission spectrum observed in the alkali metals of the electrons present in the outermost shell. This felicitated him to introduce the Pauli Exclusion Principle which states that no two quantum particles can possess the same quantum numbers.

Once this idea was introduced, physicists tried to comprehend what this “spin” meant physically. Some said that it was produced from the actual "spinning" of the electron. This idea of spin was later dropped when it was found that for an electron to exhibit actual “spinning”, it would have to spin faster than the speed of light itself, which goes against the laws of Einstein’s theory of Special Relativity. Hence the search for the physical meaning of “Spin” began.

In 1925, two Dutch physicists, George Uhlenbeck and Samuel Goudsmith theoretically deduced the physical nature of "spin" from the experiment conducted by Otto Stern and Walther Gerlach to determine the magnetic moment of the electron present at the ‘s’ Orbital.

File: UhlenbeckKramersGoudsmit.jpg. (2006, February 12). Wikimedia Commons, the free media repository.

They chose silver as the electron source as it has $4S_{1}$ valence electronic configuration. The electron emitted is passed through a non-uniform magnetic field and then it’s allowed to strike the photographic plate on which the bands appear corresponding to the emitted electron’s magnetic quantum number i.e., ‘m’. The ‘S’ orbital has the value of the magnetic quantum number to be 0, so the electron’s path through the magnetic field should be straight without getting deviated since it has zero magnetic moment thus should result in a single band of light observed on the photographic plate. But here’s the catch. There were two bands that were observed on the photographic plate on deeper inspection. The angular momentum taking the values ‘-$\frac{\hbar}{2}$’, ‘0’ and ‘$+\frac{\hbar}{2}$’ (since electron possess half-integer spins $-\frac{1}{2}$ and $+\frac{1}{2}$)

File: Schematic-configuration-of-the-Stern-Gerlach-experiment.png Gondran, Michel & Gondran, Alexandre. (2013). Measurement in the de Broglie-Bohm Interpretation: Double-Slit, Stern-Gerlach, and EPR-B. Physics Research International. 2014. 10.1155/2014/605908.

This result was attributed to being the unexplained parameter called “spin” by Pauli. From this experiment, Spin was defined to be an intrinsic form of angular momentum possessed by the elementary or quantum mechanical particles. Mathematically, he also devised a set of spin operators called Pauli Spin Matrices which act on the state vectors and gives an output. The spin matrices as follows,

$$\sigma_{x} = \begin{pmatrix}0 & 1 \\1 & 0 \end{pmatrix} \\~\\ \sigma_{y} = \begin{pmatrix}0 & -i \\i & 0 \end{pmatrix} \\~\\ \sigma_{z} = \begin{pmatrix}1 & 0 \\0 & -1 \end{pmatrix}$$

This experimental model was later used to find the spin values for various particles like protons, neutrons, neutrinos etc.

Entanglement

Tensor Product in Quantum Mechanics

In the context of our area of interest, which is quantum mechanical systems and quantum states, Tensor product is used to describe systems containing multiple subsystems wherein each subsystem is described by a vector in a vector space which in this case is a Hilbert Space. Say we have a state $|A \rangle$ and another state $|B\rangle$then the tensor product of these two states can be written as $|A\rangle \otimes |B\rangle = |AB\rangle$.

So, the |AB⟩ is an example of a combined state. When performing a tensor product between any two states, the states can be of any type of vector space. For example, Let’s consider a system of a coin, we have two possible states in this case that is, Heads and Tails and are represented as $|H\rangle$ , and $|T\rangle$. Next, we consider another system of a dice, we have 6 possible states ranging from 1 to 6 represented by $|1\rangle , |2\rangle$ and so on till $|6\rangle$. So, if we were to perform a tensor product between the states of these two systems, we would have the states of the resulting vector space to be $|H1\rangle, |H2\rangle$ and so on till $|H6\rangle$ and $|T1\rangle, |T2\rangle$and so on till $|T6\rangle$. The first half of the state label describes the state of the first system which in this case is the state of the coin and the second state label describes the state of the second system which in this case is the state of the dice.

Now, let's have a look at the properties of the said tensor product when applied to a quantum mechanical system. Let’s consider two systems ‘A’ and ‘B’ to be a penny and a dime. The person representing system A receives a penny or a dime at random and the same goes for the person representing system B. Let $\sigma$ represent the operator acting on the system to give an output. So, there’s a 50-50 chance of both the systems to receive a penny or a dime. Let’s consider that when $\sigma$ acts on $|Penny\rangle$ it gives the result “+1” and when it acts on $|Dime\rangle$ it gives the output “-1” and hence the expectation value of the operator $\sigma$ will be,

$\langle \sigma_{A} \rangle = 0 , \ \langle \sigma_{B} \rangle = 0$. But, $\langle \sigma_{A} : \sigma_{B} \rangle = -1$, since, at every point of observation, either system A or B will receive a dime hence the expectation value results in –1.

Hence, $\langle \sigma_{A} : \sigma_{B} \rangle = -1 \neq \langle \sigma_{A}\rangle :. \langle\sigma_{B} \rangle$ .

Two-Spin States

Let’s consider the spin pairs |u⟩ and |d⟩. Let the spin of system ‘A’ to be ‘$\sigma$’ and ‘$\tau$’ for system ‘B’. Let the basis for the combined system of these two spin states be $|uu\rangle, |ud\rangle, |du\rangle, |dd\rangle$⟩. From what we have discussed previously if $\sigma$ acts on $|u\rangle$, it gives us “+” and “-” for $|d\rangle$. The same is the case with $\tau$. Hence, we have the below set of equations

As we can see in the above equations, the result of the $\sigma$ operator acting on the combined states only depends on the state that is present in the first half of the state label. The same is the case with the $\tau$ operator.

Quantum entanglement is a quantum mechanical phenomenon that occurs when a pair or group of particles interact, or share spatial proximity in a way such that the quantum state of each particle of the pair or group cannot be described independently of the state of the others. In other terms it is to say that the combined quantum mechanical System cannot be represented as a tensor product of the individual states of their respective systems. Now we shall see as to why that's the case with the combined states of a quantum mechanical system.

Entangled States

The states need not necessarily be a completely entangled state or not an entangled state at all. In entanglement, we can have various kinds of entangled states by themselves. We can have a maximally entangled state or a relatively strong entangled state or a weak entangled state. The strength of entanglement can vary depending on the nature of the particle.

The maximally entangled state is called as a singlet state. It is mathematically represented as,

$$|sing\rangle = \frac{1} {\sqrt{2}} (|ud\rangle - |du\rangle) $$

The singlet state cannot be written as a product state as it would just result in zero.

There are three more maximally entangled states called the triplet states. They can be represented as

$$\frac{1}{\sqrt{2}} \ (|ud\rangle + |du\rangle) \\ \frac{1}{\sqrt{2}} \ (|uu\rangle + |du\rangle) \\ \frac{1}{\sqrt{2}} \ (|uu\rangle - |dd\rangle)$$

There is a reason why the triplet states are mentioned separately from the singlet state. We shall look into that in a little while.

Let us recollect the example of the penny and the dime systems and how we found out that the expectation values of the operators resulted in zero. The same case applies to the spin pairs.

$$\langle \sigma_{x} \rangle + \langle \sigma_{y} \rangle + \langle \sigma_{z} \rangle = 0$$

The spin polarization principle is the degree to which the spin, i.e., the intrinsic angular momentum of elementary particles, is aligned with a given direction. In accordance with the spin polarization principle, there has to exist a spin component which aligns itself along a given reference direction and thus giving the output "+1". Hence, the sum of the expectation values of the individual spin pairs is,

$$\langle \sigma_{x} \rangle + \langle \sigma_{y} \rangle + \langle \sigma_{z} \rangle = 1$$

The above condition tells us that not all the expectation values can be zero.

This holds true for all product states. However, the singlet state, which is not a product state, doesn’t follow the rules of the Spin Polarization Principle. In fact, the RHS for the $|sing\rangle$ state results in zero.

We know that,

$$|sing\rangle = \frac{1} {\sqrt{2}} (|ud\rangle - |du\rangle) $$

To evaluate $\langle \sigma_{x} \rangle$, we take the singlet state as the basis.

$$\langle \sigma_{z} \rangle = \langle sing | \sigma_{z} | sing \rangle$$

So,

$$\langle \sigma_{z} \rangle = \langle sing | \sigma_{z} | sing \rangle = \langle sing | \frac{\sigma_{z}}{\sqrt{2}} | ud \rangle - |\frac{1}{\sqrt{2}} |du \rangle$$
or,
$$\langle \sigma_{z} \rangle = \langle sing | \sigma_{z} | sing \rangle = \langle sing | \frac{\sigma_{z}}{\sqrt{2}} | ud \rangle + |\frac{1}{\sqrt{2}} |du \rangle$$

Similarly,

$$\langle \sigma_{z} \rangle = \frac{1}{2} (\langle ud| - \langle du|) (|ud \rangle + |du \rangle)$$

$$\langle σ_{x} \rangle = \langle σ_{y} \rangle = \langle σ_{z} \rangle = 0$$

with the basis as the singlet state. If the expected value of a component of σ is zero, it means that the experimental outcome is equally likely to be “+1” or “-1”. In other words, the outcome is uncertain. Despite knowing the exact state vector, $|sing\rangle$, we know nothing about the outcome of any measurement of any component of either spin or it could also mean that we as observers do not completely know all the details that there is to know about |sing⟩. But according to the rules of quantum mechanics, there is nothing that can be known beyond the state vector that is encoded in $|sing\rangle$. The state vector gives us a complete description of the system that is possible to make. This thought bugged Einstein so much that he considered this phenomenon as “spooky action at a distance” as we can know everything there is to know about the system but can never know or understand the properties of its individual constituents.

The reason why the singlet is described separately is that the singlet is an eigenvector with one eigenvalue whereas, the triplets are all eigenvectors with different degenerate eigenvalues.

Quantum entanglement is a strange and spooky phenomenon that threw off our understanding of the quantum realm and also has shown us how strange quantum particles can behave.

The EPR paradox is a result of this phenomenon that states that each particle is individually in an uncertain state until it is measured, at which point the state of that particle becomes certain. The reason this is considered as a paradox is that the particles had to communicate the information to each other at speeds faster than the speed of light which again violates the law of Relativity.

Quantum Mechanics consists of more such phenomena that just don’t seem to make sense when we try to comprehend and try to make sense of it. Definitely, it is because it isn’t completely deterministic like classical physics and the minimum act of observation destroys the information that was once available in the system. We are bound to encounter more such strange phenomenon as we try to understand the nature of the quantum realm which makes us question our preconception of reality and also could show us how much it could differ from the actual truth.

Ackowledgements

The authors would like to thank Prof. Joseph Prabagar for inspiring us to write this article and Prof. Geno Kadwin for his insights into the mathematical aspects. A shoutout to Aishwarya Girish Kumar for her comments on the first draft.

References

Shankar, R. (2014). Principles of quantum mechanics. New York, NY: Springer.
Jeevanjee, N. (2016). Introduction to tensors and group theory for physicists. Birkhauser Verlag AG.
Sakurai, J. J., & Napolitano, J. (2021). Modern Quantum Mechanics. Cambridge: Cambridge University Press.
Kumar, M. (2008). Quantum: Einstein, bohr, and the great debate about the nature of reality. London: Icon Books.
Susskind, L. and Friedman, A. Quantum Mechanics: The Theoretical Minimum. Basic Books, 2014

An Approachable Derivation of the Rayleigh-Jeans Law

Rishi Kumar — Sun, 22 Nov 2020 23:30:00 GMT

Premise

We can consider a black body to consist of electromagnetic radiation in thermal equilibrium with the cavity walls. When they are in thermal equilibrium, the average rate of radiation emission equals their average rate of absorption of radiation.

The Rayleigh-Jeans theory was constructed on the notion that when the walls of an object are in thermal equilibrium, in other words, the temperature of the walls is equal to the "temperature" of radiation. We will see what we mean by the "temperature" of an electromagnetic wave.

If we take the walls of a cavity to consist of oscillating charged particles (about its equilibrium) coupled to a standing-wave mode of an electromagnetic field. This can be seen from Maxwell's theory of electromagnetic waves, which states that a moving charged particle radiates an electromagnetic wave. A point to be noted is that the oscillating charge's frequency is equal to the frequency of its coupled electromagnetic wave. So then, it is safe to say that in thermal equilibrium, the average energy of the oscillating charge is equal to the average energy of the coupled standing-wave mode of that electromagnetic field.

Now we can see that the oscillating particle has a quadratic potential energy, $H_{pot}$ of $\frac{1}{2}aq^2$ and a kinetic energy $H_{kin}$ of $\frac{p^2}{2m}$, so according to the equipartition theorem, in thermal equilibrium the average energy is,

$$ = + = \frac12 k_B T + \frac12 k_B T=k_B T$$

Hence, the wave's energy is also taken to be $k_BT$ and can be thought to have a "temperature" of T. This forms the foundation of the Rayleigh-Jeans theory, following which we will derive the Rayleigh-Jeans formula.

Deriving the Rayleigh-Jeans Formula

We start off with the axiom that the energy distribution of black-body radiation does not depend on the cavity's shape (which can be proven experimentally). For ease of calculations, we take the shape of the cavity to be a cube. We also assume that the waves vanish at the walls, or in other words, do not pass through them.

The number of standing electromagnetic waves in a cube of length L needs to be calculated.

Let us take the wave equation for the standing electromagnetic wave,

$$ \frac{\partial^2E_x}{\partial x^2}+\frac{\partial^2E_x}{\partial y^2}+\frac{\partial^2E_x}{\partial z^2}+k^2E_x = 0 $$

Where $E_x=E_x(x,y,z)$ and $k=\frac{2\pi}{\lambda}=\frac{2\pi f}{c}$. Assuming that $E_x=u(x)v(y)w(z)$ (by variable separable method), we can separate Equation (2) into three ordinary differential equations of the type,

$$\frac{d^2u} {dx^2}+k^2_xu=0$$

Where $k^2=k^2_x+k^2_y+k^2_z$.

By inspection, we can see that Equation (3) is an equation for a simple harmonic oscillator and has the solution,

$$u(x)=B\cos k_xx+C\sin k_xx$$

Applying necessary boundary conditions so that $E_x$ or $u$ is 0 at $x=0$ and at $x=L$ leads to $B=0$ and $k_xL=n_x\pi$ where $n_x=1,2,3,...$, (since we are considering standing electromagnetic waves and look at only the positive region of the k-space) similar solutions are obtained for $v(y)$ and $w(z)$, giving the solution,

$$E_x(x,y,z)=A\sin (k_xx)sin(k_yy)\sin(k_zz)$$

Where, $$k^2=\frac{\pi ^2}{L^2}(n^2_x+n^2_y+n^2_z)$$ and $n_x$, $n_y$ and $n_z$ are positive integers.

Now, we take equation (6) to give us the distance from the origin to a point in $k$-space or often called the "Reciprocal" space (due to the units of $k$ being $(length)^-1$).

Let us take a coordinate system corresponding to the $k$-space (shown in Figure 1), with the axes being $k_x$, $k_y$, and $k_z$. And we know that $k_x=n_x\pi/l$, $k_y=n_y\pi/L$, and $k_z=n_z\pi/L$, so the points in $k$ space are separated by $\pi/L$ along each axis, and there is one standing wave in $k$-space per $(\pi/L)^3$ of volume. The number of standing waves, $N(k)$ having wavenumbers between $k$ and $k+dk$ is then simply the volume between $k$ and $k+dk$ divided by $(\pi/L)^3$. The volume between $k$ and $k+dk$ is simply the volume of a spherical shell of thickness $dk$ multiplied by $1/8$ (since we need only the positive quadrant of the k-space, hence 1/4 of the volume of a sphere) so that

$$N(k)dk=\frac{\frac12 \pi k^2 dk} {(\pi/L)^3}= \frac{Vk^2dk} {2\pi^2}$$

Were $V=L^3$ is the volume of the cavity.

For any electromagnetic wave, there are two perpendicular polarizations for each mode, so Equation (6) should be increased by a factor of 2, becoming, $$\frac{N(k)dk}{V}=\frac{k^2dk}{\pi^2}$$ From using the expression $k=2\pi f/c$ to obtain $k$ and $dk$ and substituting in Equation (8) gives us $N(f)$, $$N(f)df=\frac{8\pi f^2}{c^3}df$$ And from this, the number of modes per unit volume between $\lambda$ and$\lambda + d\lambda$ can be derived from Equation (9) by using the expression $f=c/\lambda$ to get $\lambda$ and $d\lambda$ to get,$$N(\lambda)d\lambda=\frac{8\pi}{\lambda^4}d\lambda$$ Now, each mode of oscillation has energy of $k_BT$, so the energy in the range $\lambda$ to$\lambda+d\lambda$ is $k_BTN(\lambda)d\lambda$. Hence the energy density in this region is,

$$u(\lambda)d\lambda = k_BTN(\lambda)=\frac{8\pi k_BT}{\lambda^4}d\lambda$$

This is the Rayleigh Jeans expression for spectral density in the range$\lambda$ to $d\lambda$ Considering the energy to be a continuous variable, then the average energy per oscillator is $k_BT$ and the Rayleigh Jeans formula for $u(\lambda)$ holds true. The Rayleigh Jeans formula also behaves perfectly well for long wavelengths in the electromagnetic spectrum. It also agrees with the Wien's scaling formula,$$u(\lambda)=\frac{8\pi k_BT}{\lambda^4}=\frac{f(\lambda T)}{\lambda^5}$$However, we will see in the next section why this is not a correct scaling function.

Visualising the k-space

Failure of the Rayleigh-Jeans theory in explaining the Stefan-Boltzmann Law

Incorrect Scaling function

From the previous equation for the scaling function, we can see that $f(\lambda T)=8k_BT$. So from this, we can notice that as $\lambda$ decreases, u($\lambda$) also increases. This means that the higher the temperature, the lower the wavelength waves are emitted. For example, a campfire emits many short-wavelength microwaves (which is very deadly, but thankfully that isn't how things work in nature) according to this law. Hence, the law fails in this regard.

The Ultraviolet Catastrophe

Inspecting the Rayleigh-Jeans formula and attempting to find the total energy density (by integrating with appropriate limits) of the black-body gives us an exciting result,

$$ u=\int_{\infty}^{0}=u(\lambda)d\lambda=\int_{\infty}^{0}\frac{8\pi k_B T}{\lambda^4}d\lambda=\infty $$

Here we see that the energy density is infinite, which is easy to figure out that this is nonsensical. It implies that if a cavity filled with radiation radiates an infinite amount of energy. This was named by Paul Ehrenfest as the "Ultraviolet Catastrophe" However, Stefan found out that the energy radiated is proportional to $T^4$. Hence, explaining the Stefan-Boltzmann Law using the Rayleigh-Jeans formula for energy density will end in vain.

Consequences

As the Raleigh-Jeans formula failed to address shorter wavelengths, Planck decided to use a different approach to explain the black body radiation curve. He chose not to assume that an oscillator's average energy in the wall to be $k_BT$. He knew how $u(\lambda)$ varies for short wavelengths, using Wien's formula, and wanted u($\lambda$) to be proportional to T for longer wavelengths. This lead to the formulation of Planck's formula, which perfectly described the radiation curve.

References

Bowley, R., Sanchez, M. (1999). Introductory statistical mechanics. Oxford: Clarendon.
Zemansky. (, 1957). Heat and Thermodynamics.

Reading List

Modes of Oscillations, UMDPhysics
Deriving the Rayleigh-Jeans Law,chem.libretexts.org
The Rayleigh-Jeans Law and its Derivation,applet-magic.com
Rayleigh-Jeans Law,wikipedia.org
Equipartition Theorem,wikipedia.org

Charting The Stars

V Mukund — Sat, 07 Nov 2020 11:40:00 GMT

Modern astronomy has evolved a lot. It's no longer the field populated with cranky old men who fill their garages with lenses and telescopes. No. At the heart of the field that aims to chart out the cosmos lies something far more valuable than measurement apparatus – data.

With the amount of information that we receive from various telescopes and from various sources, we cannot further rely solely on manpower in order to map the universe (in case you didn't know, it's a pretty big place).

So, as human have been doing for at least the last half-century, we offload tasks of detection, classification and interpretation to computers.

In this article, we're going to talk about some of the techniques that modern astronomers use day-to-day to handle the large influx of observational data, and how these computational techniques hold a lot of promise for further enhancing our perception of the universe in the future.

Image Stacking

Often in astronomy we try to detect signals in the presence of noise. We usually work with grayscale images where black points are high flux density and gray area is the background noise. Most of the black dots you see in an image are distant radio galaxies. But some are objects in our own galaxy, such as pulsars or supernova remnants.

In radio astronomy, flux density is measured in units of Janskys (a unit which is equivalent to $10^{-26}$ watts per square meter per hertz).

In other words, flux density is a measure of the spectral power received by a telescope detector of unit projected area. In the grayscale image picked up by our telescope, we're interested in measuring the apparent brightness of a pulsar at a given frequency.

We typically call something a "detection" if the flux density is more than five standard deviations higher than the noise in the local region. So if we look for radio emission at the location of all known pulsars, sometimes we find detections, but most of the time we don't.

When we don't detect something, it could be for a lot of reasons. The pulsar could be too far away, the intrinsic emission may not be strong at these frequencies, or the emission could be intermittent and switched off at the time we measure it.

Now you might think we can't say anything about the pulsars we don't detect. After all, how can we derive anything from a non-detection?

However, astronomers have developed clever techniques for doing just that. One of which is stacking.

It allows us to measure the statistical properties of objects we can't detect. Stacking works because the noise in a radio image is roughly random, with a Gaussian distribution centered on zero. When you add regions of an image that just have noise, the random numbers cancel out. But when you add regions of an image in which there are signals, the signals add together, and thus increasing the signal to noise ratio.

An example of noise-cancellation through interference of waves

The undetected pulsars are located all over the sky and so, to stack them we first need to shift their positions so they're centered on the same pixel. Our stacking process looks something like this. To calculate the mean stack, we just take the mean of every pixel in the image and form a new image from the result.

An example of stacking on real astronomical images

Bin Approx Algorithm

When we implemented median stacking, we ran into the problem of memory usage. Our naive solution involved keeping all the data in memory at the same time, which required much more memory than was available on a typical computer.

There are different solutions to this problem. Some of them require more money. Some require re-framing the problem. Some involve developing a smarter solution.

Perhaps, the simpler solution (although it does require the 💵) is to buy a better computer. To some extent this is a good solution, but of course as soon as the data size increases, we'd be back to the same problem.

The third approach is to improve our algorithm. Currently the problem is that calculating the median requires us to store all the data in memory. Can we calculate a running median that doesn't need all of the data to be loaded in memory at the same time?

A solution to this is the bin approx. algorithm, which works as follows.

As each image comes in, take the value of each pixel in the image and place it in a bin. Once all of the images have been processed, you end up with a histogram of counts for each pixel in the image. Because the bins in the histogram are ordered, you can sum up the counts in the histogram starting from the smallest bin until you get to half the total number of numbers. You then use the value of the resulting bin as your median.

An overview of the bin approx algorithm

To put some real numbers in this example, let's generate 1,000 random numbers from a normal distribution. When we apply the bin approx. algorithm, we end up finding a mean of zero, which is what we'd exactly expect. So, what happens when we do this stacking? We end up with an image that shows a clear detection in the central few pixels.

Result of applyng bin approx on an image sequence

It may not look too impressive initially, but let's take a step back and think about what we're seeing here.

We took a large set of images for which we could not detect any individual pulsars. When we lined the images up so all of the undetected pulsars are located in the center of image, and then calculate the median across all of the images, we can see a detection. What you're seeing here is a statistical detection of a population of pulsars that are too faint to see in our original data set.

We've used a really simple technique to probe a bit of the invisible universe that you couldn't otherwise see using telescope alone!

Cross Matching

When we create a catalog from survey images, we start by extracting a list of sources, galaxies, and stars using source-finding software. There are many different packages we can use (`sExtractor` is a common one), but most of them work in a similar way.

Basically, they run through the pixels in an image and find peaks that are statistically significant. Then, they group the surrounding pixels and fit a function, usually based on the telescope response, which is called the 'beam' or the 'point spread function'. This results in a list of objects, each of which has a position, an angular size, and an intensity measurement. The uncertainties on these measured values depend on things like noise in the image, the calibration of the telescope, and how well we can characterize the telescope's response function.

Once we have our catalogues, cross-matching involves searching the second catalogue to find a counterpart for each object in the first catalogue. To do this, we usually search within a given radius, based on the uncertainties in the position. For each source in the first catalogue, we look through each source in the second catalogue. We calculate the angular distance between each pair of sources, and if the angular distance is less than our search radius, and if that offset is the lowest we've seen so far.

When we run it on two catalogues, we end up with a list of matches between galaxies in the first catalogue, galaxies in the second catalogue and the great circles offset between them and we consider them a match.

Cross matching on two catalogs

Machine Learning in Astronomy

We know that humans are exceptionally good pattern matchers.

For example, research shows we can recognize other human faces within hundreds of milliseconds of seeing them. Perhaps it isn't surprising that we have these skills, our early survival depended on it. As scientists, we've adapted these skills to tasks such as the stellar spectral classifications. Often, human classifiers have a list of criteria or heuristics that they use to make decisions, combined with an overall intuition about the data.

So how can we train a computer program to do something we can't even express?

Early-on, researchers attempted to encapsulate this intuition with an unreasonnably large number of hard-coded rules to cover as many cases as possible. These types of systems can be successful, but they're slow to develop and they lack the ability to deal with unexpected inputs or ambiguous classes.

In contrast, machine learning algorithms don't depend on specific rules. Instead, they try to discover or learn patterns from input data we've given them. There are two broad types of machine learning algorithms, unsupervised and supervised.

Unsupervised algorithms try to discover patterns in data whereas supervised algorithms learn from classified input data, so they can classify unknown examples. For example, in the context of astronomy, let’s consider the red shifts for the training data were calculated using spectroscopic techniques, which are generally a highly reliable way of calculating red shifts. In other cases, you might use a gold standard data set classified by human experts. Another option is to use training data classified by many non-experts. Galaxy Zoo is a massive citizen science project that has made scientific discovers through the work of dedicated amateurs and intelligent machine learning methods. So, the next step is to extract features that represent your input data in some way. We classify these features into the properties of various objects and use the decision trees to come up with a solution using these classifications. So, you've selected features that can be used to represent these objects using which you can train your classifier on this known data set. The process of training basically consists of building a model between your inputs and the result. This obviously will lack in accuracy for the first couple of thousands of data sets but as we keep feeding more and more inputs to the algorithm, the accuracy also increases along with it. The more we train the machine the more the machine can further classify the unknown object with great accuracy.

Decision Tree Classifiers

Decision trees are probably the easiest machine learning algorithm to understand, because their representation is similar to how we'd like to think a logical human might make decisions.

Let's consider an example: Should I play tennis today?

A machine "learns" how to make this decision based on training data from past experience.

Let’s say the decision to play tennis depends entirely on four factors: the general weather outlook (sunny, overcast, or raining), the temperature (hot, mild, or cool) humidity (high or low), and the wind (strong or weak).

For training data, we've tracked the decisions that we made on previous days of tennis playing. For example, day nine, a cool but sunny day with normal humidity and weak winds, was a good day for tennis. Day 14, with rain and strong winds, was not a tennis day.

Based on this small data set, you could construct a decision tree by hand to work out when you should play tennis.

There are three options for the overall outlook, sunny, overcast, or rain. Based on this training data, we always play tennis when it's overcast. So that branch of the tree goes directly to a final leaf which assigns the classification play. If it's sunny, then our decision depends on the humidity. Sunny days with normal humidity are good for tennis. But sunny days with high humidity are not. Finally, if it's raining our decision depends on the wind. On rainy days with weak wind we can still play tennis, but rain plus strong winds are not enjoyable.

So that's how we could build this decision tree for this simple data set by hand. But how does the machine learning algorithm know which attributes to select at each level of the tree?

Informally, for each decision in the tree, the learner chooses a question that maximizes the new information or minimizes the error. Different algorithms use different metrics to measure this information gain.

Entropy measures how predictable or uncertain a distribution is. If the outlook is overcast, we always play tennis. So, we have complete certainty, and entropy equals zero. If the outlook is rain, we only predict we'll play tennis 60% of the time giving an entropy close to one. Information gain measures how much entropy is reduced by answering a particular question.

Once the algorithm has built the tree, you then apply this model to unknown data. In real world machine learning problems, you typically have thousands of training instances, making our predictions more robust.

Now, in this example, all of the features have been categorical data. The temperature could be hot, mild, or cool. Nothing else.

But in astronomy, most of our data will be real-valued.

Rather than having cool, mild, and hot, we now have temperatures ranging from 15 to 30 degrees Celsius, or about 60 to 90 degrees Fahrenheit. In this case, the learner follows roughly the same process, except that it needs to make decisions in the real value temperature space.

Instead of splitting by a particular value or category, a decision tree with continuous variables needs to learn "less than" and "greater than" rules such as "if the temerature is greater than 80, take the left branch."

Supervised learning can be used for either classification or regression. This tennis tree is an example of classification because the results are distinct categories, but we can also learn a decision tree with real valued results. One application of this method is using decision tree regression to calculate the red shift of galaxies.

Ensemble Classification

The decision tree approach is great, but in practice, we found it hard to get accurate results due to the presence of excessive noise from the data from the telescopes or from galaxy zoo.

So, in order to increase the reliability of the program we use a technique called ensemble learning where multiple models are trained and their results are combined together in some way.

The premise of ensemblig is that the combined classifications of multiple weak learners can give a more reliable and robust result than a single model's prediction in the face of imperfect training data.

For ensembles to work better than individual classifiers, the members of the ensemble must be different in some way. They must give independent results. If they all gave the same predictions on each instance, then having more of them would make no difference at all.

There are many ways to achieve this model independence, including using different machine learning algorithms, different parameters, for example the tree depth, or different training data.

One of the most popular methods is called bootstrap aggregating, or bagging for short. In bagging, samples of the training data are selected with replacement from the original training set. The models are trained on each sample. Bagging makes each training set different with an emphasis on different training instances.

A random forest classifier is a supervised machine learning algorithm that uses an ensemble of decision tree classifiers. It builds an ensemble by randomly selecting, either subsets of training instances, bagging, or selecting a subset of features of each decision point.

Visualization of ensembling

The nice thing about a random forest is that we can use it pretty much everywhere we use a decision tree. Yes, it does usually involve setting more parameters and more computationally intensive, but it usually gives us better results.

Because each classifier in the ensemble returns a vote, we can not only find the most popular category but we can look at the distribution of votes over all the categories. This allow us to calculate an estimated probability that our classification is correct and identify which test instances are particularly hard for the system to classify. These probabilities can also let you identify out liners, objects that are not similar to any of the training classes, which can be a great way of identifying those rare objects that need further scientific investigation.

Conclusion

With the help of computers, the process of interpreting astronomical data has become significantly easier and more efficient. The real takeaway here is how useful it is for ideas from one field to bleed into another.

When Galileo first pointed those aligned lenses to the sky, he saw the stars. But today, thanks to statistics, machine learning, and advances in computation, we see more of what this universe has to offer.

Much, much more...

It’s not that there are things that science can’t explain. You look for the rules behind those things. Science is just a name for the steady, pain in the ass efforts that goes behind it.

Hyperbolic Functions and Non-Hyperbolic Claims

Tarun Prasad — Sun, 30 Jun 2019 16:37:20 GMT

Trigonometry. The high-school mathematician's nightmare.

When we first learn about trigonometry in school, it’s introduced as being all about right-angled triangles—it is after all the ratios of two sides of a right triangle, isn’t it? Soon, we realize that it isn’t, in fact, limited to just triangles; we redefine all our notions of trigonometry with the help of a unit circle. Specifically, we learn that if we were to parametrize the unit circle, $x^2 + y^2 = 1$, the coordinates of a general point on it can be represented as $x=\cos t$ and $y=\sin t$, with $t$ being the parameter.

By Gustavb - Own work, CC BY-SA 3.0, Link

But why stop with a circle? Why can’t we extend these ideas of parametrization to another common conic—the hyperbola? Trust me, by the time you get to the end of this article, you’ll agree with my claim that the hyperbola, a curve that doesn’t seem to show up much in our daily life, is actually much more common than we think.

Before we jump into this, however, let’s first play around with our traditional trigonometric functions: sine and cosine.

Euler’s formula, $e^{it} = \cos t + i\sin t$, gives us a neat, perhaps unexpected, relationship between the trigonometric functions and the exponential function. If we rearrange the terms a little, we arrive at the following results:

$$\cos t = \frac{e^{it} + e^{-it}}{2}$$

$$\sin t = \frac{e^{it} - e^{-it}}{2i}$$

This seems a little odd; even if we just provide purely real values of $t$ as the input to the functions, the right-hand side seems to have something to do with imaginary numbers, although it does eventually simplify to give us real-valued outputs.

Also, as expected, these definitions of sine and cosine do indeed satisfy the equation $x^2 + y^2 = 1$. Don’t take my word for it. Pick up a pen and a piece of paper and square and add the functions and see for yourself.

So what if, just what if, we defined another pair of analogous functions based on these representations of sine and cosine, but this time, with all the imaginary units on the right-hand side removed?

Let’s arbitrarily call these functions $\cosh$ (pronounced like kosh in kosher) and $\sinh$ (pronounced sinch or shine) respectively.

$$\cosh t = \frac{e^{t} + e^{-t}}{2}$$

$$\sinh t = \frac{e^{t} - e^{-t}}{2}$$

"Okay, but what does the $h$ stand for?" you ask? To find out, repeat the pen-and-paper process, but this time square and subtract the two instead. You should see, on simplifying, that $\cosh^2 t - \sinh^2 t = 1$.

It’s starting to make a little sense now, isn’t it?

If $x = \cosh t$ and $y = \sinh t$ are the parametric coordinates of a curve, the locus of all such points gives rise to the equation $x^2 - y^2 = 1$, which is simply the equation of a unit hyperbola! This is the very reason we call these functions the hyperbolic cosine and the hyperbolic sine.

We can then go on to define other hyperbolic functions like $\tanh$ and $\text{sech}$ just like we did for their circular counterparts. These functions satisfy identities analogous to those of the ordinary trigonometric functions (which I would encourage you to derive).

Hyperbolic functions: sinh, cosh, and tanh

Circular Analogies

Looking back at the traditional circular trigonometric functions, they take as input the angle subtended by the arc at the center of the circle. Similarly, the hyperbolic functions take a real value called the hyperbolic angle as the argument. To understand hyperbolic angles, we first need to think about traditional angles in a slightly different way.

If an arc of a unit circle subtends an angle of $\theta$ radians, then the area of the corresponding sector is $\frac{\theta}{2\pi} \times \pi$ or $\frac{\theta}{2}$. In other words, the angle is equal to twice the area of the sector.

Analogous to this, a hyperbolic angle is twice the area of the corresponding hyperbolic sector (which, like its circular counterpart, is simply the region enclosed by rays from the origin to two points on the hyperbola). This means that if you choose a point ($\cosh t$, $\sinh t$) on the unit hyperbola, the line segment joining the point with the origin creates a sector of area $\frac{t}{2}$ with the x-axis and the hyperbola.

Hyperbolic sector and its relation to the hyperbolic angle. The original uploader was Olympic at Ukrainian Wikipedia. CC BY-SA 3.0, via Wikimedia Commons.

The main point to keep in mind here is to visualize hyperbolic angles with the help of areas, not as a figure created by two rays as you might imagine normal angles. From this, it also arises that hyperbolic angles are unbounded, as the area of the sector keeps increasing as the point moves farther and farther away from the origin, which is not the case for circular angles.

Why Should I Care?

Take a piece of rope and suspend it from two rigid stands. What geometric shape do you observe the rope taking up?

Take this rope barrier around the tomb of Spanish architect Antoni Gaudí, for example. By Bocachete [Public domain], via Wikimedia Commons

If you said parabola, I don’t blame you; I thought so too at first. And so did Galileo! But as it turns out, this shape is actually a catenary, the shape of the hyperbolic cosine function. It happens to appear to be strikingly similar to a parabola, but it isn’t quite the same.

Source

To prove this, let’s consider a point $P(x,y)$ on the rope, separated from the bottom-most point $O$ on the rope by an arc of length $l$. Let $\lambda$ be the rope’s linear mass density.

If we consider the section of the rope between $O$ and $P$, it is in equilibrium due to three forces: the two tensions ($T_0$ and $T_P$) pulling it outwards at either end and gravity pulling it downwards.

As the section is in equilibrium, the net forces in the horizontal and vertical directions must each be zero. Hence we obtain the following relations:

$$W = \lambda lg = T_P \sin \theta$$

$$T_0 = T_P \cos \theta$$

This in turn gives:

$$\tan \theta = \dfrac{\lambda lg}{T_0}$$

In the right hand side of the above equation, the terms $\lambda$, $g$ and $T_0$ are all constants, so we can replace them with a single constant $a = \frac{\lambda g}{T_0}$. Also, $\tan \theta$ is nothing but the slope of the curve, $\frac{dy}{dx}$. Hence:

$$\dfrac{dy}{dx} = al$$

Differentiating this, we get:

$$\dfrac{d^2y}{dx^2} = a \dfrac{dl}{dx}$$

The small arc length $dl$ can be considered to be approximately a straight line segment, and hence the hypotenuse of a right triangle, with the other two sides being $dx$ and $dy$. Using the Pythagoras Theorem on this triangle yields the following relation:

$$dl^2 = dx^2 + dy^2$$

Dividing both sides by $dx^2$ and then taking the square root on both sides gives the following expression for $\frac{dl}{dx}$ which we can then substitute in the previous equation:

$${\left(\dfrac{dl}{dx}\right)}^2 = 1 + \left(\dfrac{dy}{dx}\right)^2$$

$$\dfrac{dl}{dx} = \sqrt{1 + \left(\dfrac{dy}{dx}\right)^2}$$

$$\dfrac{d^2y}{dx^2} = a \sqrt{1 + \left(\dfrac{dy}{dx} \right)^2}$$

This is nothing but a second order differential equation, which we can solve by making the substitution $z = \frac{dy}{dx}$, and then using the variable separable method.

$$\dfrac{dz}{dx}= a \sqrt{1+z^2}$$

$$\int \dfrac{dz}{\sqrt{1+z^2}} = \int {a} {dx}$$

$$\ln (z + \sqrt{1+z^2}) = ax + c$$

When $x = 0$, $z = \frac{dy}{dx} = \tan 0 = 0$, and we can hence safely set $c$ to 0. From here, it's just a matter of simplifying the equation and finding the value of $z$ as:

$$z = \dfrac{dy}{dx} = \dfrac{e^{ax}-e^{-ax}}{2}$$

The right hand side of this equation is starting to look a little familiar, isn't it? We're getting close to the answer! The last step is to once again solve the above differential equation (the constant of integration can be ignored as we are free to move the coordinate axes and set them to a position at which $c = 0$).

$$\int dy = \int \dfrac{e^{ax}-e^{-ax}}{2} dx$$

$$y = \dfrac{e^{ax}+e^{-ax}}{2a} = \dfrac{\cosh {ax}}{a}$$

And there you have it! We have just successfully shown that the rope does take the shape of a catenary and not a parabola, with the constant $a=\frac{\lambda g}{T_0}$ determining the exact shape.

So hyperbolas and hyperbolic functions do indeed manifest in various (albeit somewhat hidden) ways in everyday life. Of course, dangling ropes and barricade poles are just some examples of where you might encounter these functions.

They also happen to be used in machine learning and neural network applications, as well as in non-Euclidean geometry, for example in maps that use the Mercator projection (look up the sigmoid function and the Gudermannian function to find out more). My initial claim that hyperbolas are much more common than we think is therefore clearly no hyperbole.

Q.E.D.

Walking on Water

Shanmugha Balan — Fri, 08 Mar 2019 08:24:37 GMT

Do you remember tossing objects in water to see if they would float? Floating has to do with the density of water and the density of the object you are chucking into it. The density of an average human is $945-985 \hspace{0.1cm} kg \hspace{0.1cm} m^{-3}$. And the density of seawater is slightly above $1000 \hspace{0.1cm} kg \hspace{0.1cm} m^{-3}$, which means a human can happily float. That's how we swim. But there's a vital difference between walking on water and swimming in water.

The fraction of volume of a substance that shows up above another when dunked in it is given by

$$V_{above} = \frac{d_0 - d}{d_0}$$

$d_0$ is the density of the liquid (here, water) and $d$ is the density of the substance being dunked in (here, a human). The amazing thing about this is that it doesn't depend on mass of the substances. The not-so-amazing thing about this is that just around $0.055$ times your volume will be afloat. So how on Poseidon do you walk on water?

One. Fake It. You could always Photoshop a normal picture of you standing, to a normal picture of you standing on water. But that's not so nice.

Two. Fake It. You can try to create an illusion, to make it seem as if you are walking on water. You can change the substance you are walking on, you can shift the angle from which people see you, you can play tricks on people's minds by influencing what they perceive. But that's not so nice.

Three. Fake It. With Science. Try dropping a needle into water, pin point down. It effortlessly pierces through it and sinks. Now dry it and place it horizontally. It probably will sink (unless you've got really dexterous fingers). To definitely make it float, grab a small piece of absorbent paper (bigger than the needle) and place the needle on top of it. Now place the system gently on water. The paper absorbs the water, becomes heavy (denser) and sinks. But the needle, which is made of metal and whose density is higher than water somehow floats! This is because of a phenomenon called surface tension. This delicate phenomenon is the cause of water rising up against the pull of gravity in plants (but it goes under a different name, capillarity).

If you look closely, you can spot that the water is slightly warped, curved near the needle-water interface. This is because water molecules try to minimize its exposed surface area. If you have a bottle with a narrow lid, you can add a few more drops of water after its full, to get a "membrane" curved upwards. This is because the molecules are pulling the ones outside, so that they don't fall. The needle also causes a disruption in the surface (like the extra drops of water) and hence, there's a tiny force (because of surface tension) which can support its tiny weight. But what about a full human? We gorge on delicacies, and our mass is too huge to be supported by this feeble force. That's not so nice.

Four. Fake It. With Science. How do boats float? They displace water. This is what a naked genius, (almost all of whose ideas have been disproved except, obviously this one and a few others) found about two millennia ago. He had a remarkably scientific mind for people from his time. He made astute observations and most of them made sense with the available tools of the time. The advancement of technology has made us look at his ideas as outdated, but this elegantly simple equation helps boats stay afloat.

$$B = W$$

For those of you who think that's too simple, let me deconstruct it a bit. $B$ is the buoyant force, the upthrust experienced by an object in water. $W$ is the weight of the volume of water displaced by immersing the substance.

For boats, only a part of their whole body is immersed under water, only a certain fraction of their hull. To find that out, we'll have to balance its weight with the forces exerted by water. But wait a minute... That gives us the first equation in this article. So no matter how much you cry over Archimedes' grave, there's only 5.5% of your body volume he can levitate above water. That's not so nice.

Five. Fake It. With Science. Allow me to creep you out with this:

This bad boi right here, he can run on water. Not walk, but run (Walking is harder, in case you didn't get that). How does it do it? Well, it slaps the water with its foot. Then it pushes it down into the water very quickly. This creates an air cavity above its foot. Before water can fill that cavity, it nicks its foot away to avoid water drag, which will not only decrease its velocity, but also drag it under. During this full action, which is one step the upward impulse on the lizard by the water should match gravity's downward pull. This is managed by its special webbed feet which help it retain that air cavity when it needs it and gives it the ability to slap the water hard enough to give it an upward impulse. These lizards are very light-weight weighing (rather, containing) only around 150 grams. They flit across the surface of water at a speed of $6 \hspace{0.1cm} m \hspace{0.1cm} s^{-1}$, which is impressive for such a tiny creature. Some lizards can sustain these speeds for almost 100 m. These lizards would finish a normal 100 m sprint on water in about 20 seconds, which is quite respectable considering its teeny stride. Shame on us with our humongous stride advantage. So, since we are too slow to do this, let's call Usain Bolt to do it for us. I have no clue how heavy he is, so I'll take that as the mass of an average human, around 80 kg. What do I tell him to do? I tell him to wear webbed shoes to help him slap the water properly. Then, I tell him to run just three times faster than he ran in Berlin in 2009. But what does he tell me? That's not so nice.

Inside Out

Ajay Uppili Arasanipalai — Tue, 05 Mar 2019 13:40:44 GMT

If you've spent any time at all on the math-y side of the internet, you've probably seen something that looks like this:

Source

Accompanied by the glorious infinity zooming psychedelic imagery is usually an eloquent description of how this flowering fractal provides indisputable proof of the existence of a supernatural creator of the grand design.

Source

Then the name drops: "The Mandelbrot Set."

Pure Enlightenment.

But wait a second. Set? As is, a set theory set? You know, back in my day, sets looked more this:

Source

Yep. Despite all the flashy colors and trippy graphics, the Mandelbrot set is just that– a good ol' fashion set.

But to the aspiring mathematician still stuck in high-school, there's still probably a lot of murky water out there.

For starters, if the Mandelbrot set is a set, what is it a set of? What does it contain? How is it defined? And why in the world does it look like something you'd get when you throw lightning bolts, snowflakes, and burnt onions into a smoothie machine?

What the chef actually threw into the smoothie machine

So clear lets something up. The Mandelbrot set is a set of complex numbers. That's it. Nothing too fancy.

It's a set of numbers of the form $ a+bi $ where $ a, b \in \mathbb{R}$ and $ i^2 = -1$.

The reason you're able to look at it on your computer (or phone, who am I to judge) screen is that you are looking at the complex plane.

All the points that lie in the Mandelbrot set are black. That's the big cardioid shape with the circles around it.

The point that lies outside the Mandelbrot set are colored in ways that make you actually want to look at the thing without yawning your face out.

I'll go into how exactly we color the points that are outside the set a bit later. But for now, remember that the black points are the in-the-set points.

A DIY guide to building your own Mandelbrot set

So how do we decide which point's go in, and which go out?

There's a simple rule: a complex number $c$ is in the Mandelbrot set if $|z_n|$ does not diverge as $n$ approaches infinity, where

$$ z_n = {(z_{n-1})}^2 + c$$

In English, a complex number $c$ is in the set if, when you start with that number and keep hitting $ans^2 + c$ on your calculator, your calculator doesn't spit out an overflow error and burn to a crisp.

Source

Does this actually work?

As a test case, consider $c = 1$

$$ \implies z_1 = 1^2 + 1 = 2$$
$$ \implies z_2 = 2^2 + 1 = 5$$
$$ \implies z_3 = 5^2 + 1 = 26$$

So let's just spare the poor computer some RAM space and conclude that 1 isn't in the Mandelbrot set..

By the way, if you haven't noticed, I'm using $\mathbb{M}$ to represent the Mandelbrot set, since it just looks so cool.

Source

Technically, it's not a requirement that the sequence converges, only that it doesn't diverge. For example, it's entirely possible (and this actually does happen) that $z_n$ keeps oscillating between 2 numbers.

You know what? Let's actually try that out.

Start with $c = -1$, which I'm just going to tell you is in the Mandelbrot set.

$$ z_n = {z_{n-1}}^2 +c $$
$$\implies z_1 = (-1)^2 -1 = 0 $$
$$\implies z_2 = (0)^2 -1 = -1 $$
$$\implies z_3 = (-1)^2 -1 = 0 $$

So using a starting point of $c = -1$, the iterations become bugs bunny between $0$ and $-1$.

Source

The Best Mandelbrot Set Spotting Sites For Tourists

Here's something fundamental about the Mandelbrot set that's also extremely obvious– you can see the whole thing. Why is this detail important? Take a look at it again:

Source

It looks like we could theoretically draw a circle around this thing. A circle that contains the entire Mandelbrot set. So what would the radius of this circle be?

In other words, I'm asking you to find a number $r$ such that $\forall c \in \mathbb{M}, |c| \leq r $

The answer is 2.

Why 2? Well, think about how we iterate the sequence, we square the previous value and add a constant.

Let's say you start at the origin of the complex plane and go to the point corresponding to the complex number $c$.

Squaring a complex number squares its magnitude (by DeMoivre's theorem). So to get to the next point, you'd have to square your magnitude (distance from origin) and move in the direction of $c$.

In the most extreme case, let's say that $ z_1 = {c}^2 - c$. This means that you're stepping forward in the direction of ${c}^2$ and stepping backward in the direction of $c$.

If $|c|>2$, squaring $c$ would result in more than doubling the magnitude of $c$, since $x^2 > 2x \forall x>2$ (because 2 squared is 4).

So in a sense, $c$ wouldn't be strong enough to pull $c^2$ back if it was outside the disk centered at the origin with radius 2.

Of course, pushing, pulling, and stepping backward aren't things you do in math, so I'd encourage anyone reading this article to check out a formal proof/derivation of the "radius of escape" of the Mandelbrot set (if you're into that kind of stuff).

Unleash your inner 3 year old

If you've read this far, I'm assuming that you're waiting for the point where I just tell you how you're supposed to get this:

Source

Oh well. I did try to convince you that the math is more beautiful than the picture. But I guess numbers and powers just can't hold their own against neons and pixels.

The coloring is done by assigning different colors to each complex number on the plane depending on how fast they explode to infinity.

Since most of this stuff is done by a computer anyways, you'd have to consider the pixel coordinate as a point on the complex plane, and start iterating that point with the sequence $z_n = {z_{n-1}}^2 + c$, just like we've done a few times above.

Remember, since the Mandelbrot set contains points with magnitude less than 2, if at any point during our iteration, the magnitude of the number we calculate exceeds 2, we throw it away, saving computing time.

The number of iterations we do before throwing away a number (before it reaches 2) is used to set colors, say 2 iterations for blue, 3 iterations for dark blue, and so on.

Conclusion

I think that at the heart of the Mandelbrot lies a fundamental misunderstanding.

We assume that it's all about the infinite. Infinite zooming, infinite fractal, infinite points.

But the very definition of the Mandelbrot set suggests that it's not about the infinite at all. It's about the finite.

It's not about the numbers that explode off to infinity, never to be seen again. It's about the numbers that stay within their tightly knit community of the circle with radius 2.

It's not about the flashy colors that show you everything outside the Mandelbrot set. It's about the darkness inside.

And the beauty lies not in the shiny graphics that make it look like a mystery beyond comprehension. The true beauty of the Mandelbrot set lies in the fact that we can understand it. In one simple equation:

$$ z_{n+1} = {z_n}^2 +c $$

It's not about what's on the outside, it's the inside that counts.

Radio Call from Space Taxis

Shanmugha Balan — Thu, 28 Feb 2019 08:54:03 GMT

Comets have long been feared by humans, as acts of God to signal impending doom. Multiple disasters were said to occur after the sighting of a comet, causing the innate event-cause backward relation by humans. Just when this myth was dispelled, spectroscopic analysis of the tail of the Halley's comet in 1910 revealed the presence of cyanogen, a highly toxic gas. Quacks used this to sell gas masks, "anti-comet pills" and "anti-comet umbrellas".

When all these myths were put down, in 1965, a comet was found to emit radio waves of around 1600 MHz. An uproar was created, with some theories claiming aliens from other planets, were using the comet as an efficient way to travel, by harnessing the natural motion of the comet as a "space taxi". Other claims said that it was the emission line of a new form of matter unknown to humans, and called it mysterium.

Further studies show that all these claims are false. In 1967, it was firmly established that actually this happens:

As the comet approaches the Sun, it has to get warmer than the icy depths of the aether :) in the far reaches of our Solar System. This increased heat causes the ice in the comet to melt and causes liquid water to get formed. Further heating creates an atmosphere of water vapor around the icy dirty snowball region of the comet. The radiant Sun now photolyses the energetic vapors as

$$H_2O \longrightarrow H^+ + OH^-$$

When the comet is still quite far from the sun, both $E_1$ and $E_2$ excitation in $OH^-$ occurs. But as it gets closer to the sun, the $E_1$ excitation reduces. This is because the comet isn't a stationary body, there is relative motion between the comet and the sun. So the Doppler Effect caused by the comet's motion causes the Fraunhofer lines to overlap with the wavelength of the $E_1$ excitation. So, $E_2 \rightarrow E_1$ occurs.

Chill a moment. What are Fraunhofer lines? Where does the Doppler effect come into this? Fraunhofer lines are spectral lines of absorption which feature in the optical spectrum of the sun as dark bands. These lines happen to coincide with the emission lines of some elements when heated. So these dark lines are caused by absorption by these elements in the solar atmosphere. These lines are just the absence of a certain wavelength in the spectrum. When the comet moves, this causes a Doppler shift. The excited emission lines now overlap with the Fraunhofer absorption ones.

The energy of one of these emission lines of $OH^-$ was found to be $6.9 \times 10^{-6} eV$. This corresponds to a frequency of 1665 MHz.

$$\Delta E = h \nu \Rightarrow \nu = \frac{6.9 \times 10^{-6} \times 1.6 \times 10^{-19}}{6.626 \times 10^{-34}} = 1665 MHz$$

These lines were observed in the comet Kouhoutek in December 1973 to January 1974. Other emissions have also been found, including $CH_3OH$ & $SiO$.

When NASA scientists directed a telescope towards Chryse Planitia (in Mars, near the landing site of the PathFinder), they discovered laser radiation in its atmosphere. This natural laser was because of excitation of similar levels in $CO_2$ which is abundant in the Martian atmosphere. And Mother Nature gives it back to us humans by playing with lasers before we even 'invented' it.