Under-display cameras using machine learning

In devices used for videoconferencing, locating the camera behind the display would improve the experience by putting the camera’s viewpoint nearer where the user is looking, thereby recreating the experience of eye contact that individuals have in face-to-face conversations. Additionally, it would allow the display to be larger, using the full surface of the device, and make for a cleaner design, uninterrupted by camera windows or notches.

However, capturing an image through the display entails addressing two objectives that are at odds with each other. First, the display must produce a bright, high-quality image. Secondly, enough light must pass through the display in the other direction for the camera to capture a clear image of the user. Unfortunately, improving the design for one of these goals tends to degrade the other. So, to side-step this conflict, this project investigates whether these challenges can be mitigated computationally.

In addition to this software-based approach, the team has also investigated using active sensing to improve through-screen image capture using optical hardware-based techniques.

The difficulty of capturing an image through the screen is readily apparent when we look at close-ups of typical display panels.

Close-up of a 4K transparent OLED (T‑OLED) display — **Transparent OLED (T‑OLED).** This 4K display is designed specifically to let light through. Even with that design goal, the light transmittance is only 18% and the open area makes up about 20% of the plane.

Close-up of a phone PenTile OLED (P‑OLED) display — **PenTile matrix (P‑OLED).** This display, designed for a mobile device, has an even more complex pattern, resulting in a transmittance of only 3%, despite having a slightly greater open area at 23%. Additionally, because of the material, the P‑OLED causes a color shift.

Making the problem even more challenging, these structures are small enough that they diffract the light passing through them, badly degrading the image and corrupting the spatial frequencies in unusual ways due to the complex shapes of the openings.

Below are images of the point spread functions (PSF) of the sample T‑OLED and P‑OLED displays mentioned above. A PSF describes what will happen to a point source of light passing through the system (in this case, the displays). A perfect imaging system would leave the point source looking like a point in the PSF. But here the single point of light is diffracted into many additional unintended points (called "side lobes") spread over the image. Since this happens for each source "point" in a real scene, this causes the image to blur as these side lobes mix in unrelated pixels across the sensor. In the PSFs shown, you can see how the differing structures of the two displays diffract the image differently.

A point of light diffracted through a T‑OLED display — The T‑OLED display produces several strong side lobes near the main lobe.

A point of light diffracted through a P‑OLED display — The P‑OLED display produces many weaker and sparse side lobes distributed more widely.

The process

To correct for these distortions, we investigated using supervised learning to train a neural network to produce the unobstructed image given the degraded image. To get data for this training, we constructed an imaging system that photographs a scene (an image on a separate monitor) with a camera twice: once while looking through a display sample and again while unobstructed. A network was trained for each of the two displays described above. (For a more detailed description of the recovery process, see the linked publication.)

The results

The recovered images were found to be sharper and less noisy with both T‑OLED and P‑OLED displays.

Original image of a group of people; same image captured through T‑OLED is blurry; recovered image closely resembles original. — **T‑OLED.** Two image sets, each showing (left) original, (middle) through display, and (right) through display with recovery.

Original image of a lion; same image captured through T‑OLED is blurry; recovered image closely resembles original. — **T‑OLED.** Two image sets, each showing (left) original, (middle) through display, and (right) through display with recovery.

Images captured through the P‑OLED display were almost indistinguishable, but the recovery produced a striking improvement:

Original image of a group of people; same image captured through P‑OLED is dark, faint, and barely discernible; recovered image closely resembles original with slight softness. — **P‑OLED.** Two image sets, each showing (left) original, (middle) through display, and (right) through display with recovery.

Original image of a lion; same image captured through P‑OLED is dark, faint, and barely discernible; recovered image closely resembles original with slight softness. — **P‑OLED.** Two image sets, each showing (left) original, (middle) through display, and (right) through display with recovery.

Across the data set, the neural network achieved a 200% gain in contrast, with a 97% image restoration and a gain of 10 dB in SNR, showing potential for this approach.

Table 1. Modulation-transfer function (MTF) budget for camera placed under T‑OLED and under P‑OLED