Under-display cameras using active sensing

It is of interest within the industry to make a device with the camera placed behind the display for several reasons. It would allow for a cleaner industrial design that does not require camera windows or notches; it would allow the display to cover the full surface of the device; and it would provide better video conferencing due to the camera capturing from nearer where the user is looking.

However, imaging through the display requires overcoming two conflicting goals. On the one hand, the display needs to show a bright, high-quality image. On the other hand, enough light must pass through the display uninterrupted in the opposite direction for the camera to record. The more you optimize the design for one of these goals, the more the quality of the other tends to suffer.

Other work on the team investigated trying to address this conflict using machine learning. However, for some displays, parts of the scene cannot be faithfully recovered by a machine-learning algorithm because too much information would be missing in the captured image. Although a good machine-learning algorithm might produce a plausible reconstruction, for some applications (such as biometrics) it is essential that the image be derived from true, and not synthetic, data. To address this need, this project investigates an optical hardware-based method of through-display image recovery.

The problem of imaging through a display can be visualized by looking at a close-up of an example display that would be placed in front of the camera:

Close-up of a 4K transparent OLED (T-OLED) display

The picture above shows a 3×3 patch of pixels of a 4K-resolution transparent OLED display. Even though this display is designed to allow light through, less than 20% of the area shown is open to allow that (the white areas). In addition, the shape and size of these openings block sets of rays from the scene so that the image captured with a typical camera would be blurry and even worse, completely missing some spatial frequencies.

This can be seen by examining the modulation-transfer functions (MTF) of screens with progressively narrower open areas. The narrower the pixel opening, the more the valleys of the MTF curve are truncated:

MTF for screen with 60% open area — Screen designs and simulated MTF curves for three different open area fractions. White indicates a clear region. The solid line is the MTF through the screen, the dotted line is the MTF of the imaging lens.

MTF for screen with 40% open area — Screen designs and simulated MTF curves for three different open area fractions. White indicates a clear region. The solid line is the MTF through the screen, the dotted line is the MTF of the imaging lens.

The valleys of the MTF curve are truncated — MTF with sample object spectrum. Shaded regions denote lost frequency content.

This is a problem. Because we are completely missing some spatial frequencies, we cannot even play computational tricks to boost them or otherwise recover the image after it has been captured. Imagine that we were trying to photograph a person wearing a striped shirt. If we couldn’t see stripes of that width, then in the photograph the shirt (ignoring other distortions) would look like a plain solid shirt. We would have no way to know, from the captured image, whether the shirt was solid or striped—nor even whether something was missing.

The process

To try to “fill in” these missing frequencies that the camera can’t see because they are blocked by the display, we investigated whether the problematic frequencies could be somehow translated within the scene (outside of the device and before reaching the display) into a different frequency range that the display does not block, so that the translated frequencies could pass through the display and be captured by the camera. In this work, we performed this translation by projecting a pattern onto the scene that would produce a moiré effect with the spatial frequencies already there such that information about the frequencies the camera cannot see are contained within the frequencies of the moiré pattern, and the moiré’s frequencies are within the range that can reach the camera.

More about moiré

The moiré effect is caused by multiple spatial frequencies being overlaid on one another.

When a repeating pattern . . .

. . . is overlaid by another similar repeating pattern . . .

. . . the combination can produce a third pattern where the frequencies of the two original patterns align or anti-align. This is called the moiré effect, which may be seen in this image radiating out from the innermost ellipse.

Since you are likely viewing these images on a computer screen, it’s possible that you will even see moiré effects on the first two images, depending on the size and resolution at which you are viewing this page, as the grid of pixels making up your display also produces a repeating pattern that may interact with the patterns in the example images. (Even so, the third image should have a much more pronounced moiré effect.)

However, we cannot simply project the pattern from in front of the display. After all, if the motivation for this work is to remove the camera from in front of the display, it would make no sense to replace it with another component—a light source—in the same place. Nor can the pattern simply be projected from behind the display either, for the required pattern itself also contains frequencies that the display would block as it left the device. So, instead, the team produced the pattern from behind the display by creating two coherent point sources positioned within one display pixel (about 80 microns apart). The light from these sources interferes at the scene, and the resulting sinusoidal interference pattern aliases the problematic spatial frequencies within the scene to a lower range which can pass through the display and be captured by the camera.

Schematic of optical system layout for capture of structured illumination images

However, once the scene is illuminated this way, the camera will see these lower spatial frequencies (which are from the moiré but not really in the scene) simultaneously with the real frequencies from objects in the scene that just happen to be at the same lower frequencies as the aliased frequencies. To disentangle these frequencies so as to shift the altered frequencies back to what they originally were in the scene, the scene is imaged three times with different phase relationships between the point sources. This provides enough information to recover the original frequencies.

A more detailed discussion of the recovery process can be found in the linked publication.