A big prime p is chosen along with a primitive root g of p.
(This means that g^{x} mod p takes on all values between 0 and p for x ranging from 0 thru p−1.)
Everybody knows p and g.
When Alice and Bob want to communicate, they need a key that is a shared secret that only they know, to be used for conventional crypto, such as AES.
Alice chooses a secret, a, between 0 and p−1 and bob choses another, b.
Alice sends g^{a} to Bob and Bob sends g^{b} to Alice whereupon they respectively compute (g^{b})^{a} and (g^{a})^{b} both modulo p, but these two are the same number which serves as the shared secret AES key.
At the end of the conversation they expunge their memories of a, b and the shared secret to achieve perfect forward secrecy.

This article suggests that some variants of the common web protocol TLS provides an “ephemeral Diffie-Hellman” mode. I wonder how well they manage to expunge their keys.

I had understood that Alice and Bob each choose their secrets, a and b, sometime after learning p and g but probably before either had heard of the other.
They publish g^{a} and g^{b} as their respective public keys in the phone book, put it on their business cards and mail boxes.
When Alice wants to send a secret message to Bob she learns g^{b} and sends g^{a} to Bob in the clear and uses AES key = g^{ab}, to encipher the secret message.
The leading g^{a} serves both to identify Alice to Bob and also allows Bob to compute g^{ab} in order to decipher the message.

In the latter scheme we lose forward secrecy but gain public keys. These public keys serve as authentication but not for signing as far as I can see. The secret exponents must now be guarded long term, just like secret RSA keys.

There remains a problem. How is Bob to be sure that the message actually came from Alice? In other words how is the secret authenticated? No one but Alice, who knows her exponent, and Bob are able to compute the AES key, but perhaps the encrypted message was not the output of an AES encryption. There is a substantial literature proposing how to make messages quantifiably redundant before encryption so that after decryption the redundancy indicating authenticity can be verified. There is a surprising counter literature describing attacks on such redundancy schemes. I think that the authenticators win but when a program is the recipient of a deciphered message with garbage, be prepared to reliably detect this and declare the message bogus. The Alice and Bob protocol above lack such provisions. In a collaboration include time stamps and serial numbers in the plain text to guard against replay attacks. The real Diffie-Hellman has the same problem.

The ‘real’ Diffie Hellman protocol provides perfect forward secrecy; my variation does not.

An inelegant but vetted solution is to observe that p must be so big that the values like g^{ab} suffice to provide both an encryption key and separately an authentication key to supply a message authentication code.
Better yet check out this protocol which does less symmetric crypto but still needs the double key material.

This works for elliptic crypto version of DH too but then the curve might not be automatically big enough to provide both keys.