Delaying camera feed with AVFoundation Framework

7 min readOct 30, 2020

For the last couple of weeks, I’ve been working on a project where I had to delay the camera feed of an iPhone with some milliseconds.

Using the AVFoundation Framework allows us to customize and configure the camera’s settings. It’s the highest level framework for all things audio/visual in iOS, it’s very powerful and it gives you all the flexibility you could possibly want.

The process of creating a delay with a great user experience was very difficult. So I hope this tutorial saves a lot of headaches for those developers who need to do the same.

Let’s dig in!

Allowing our App to access the camera

First of all, we have to allow our App to access the iOS camera. To do that, we have to add a key (NSCameraUsageDescription) in the Info.plist file and a message that explains why we need to access the camera. This message will be shown by the app at the beginning.

Creating the necessary properties

Now, we have to import the framework and create the properties we need to work with the iOS camera (I recommend having a specific file to manage the camera logic e.g. CameraFeed).

We are going to use the following classes:

AVCaptureSession: An object that manages capture activity and coordinates the flow of data from input devices to capture outputs.

AVCaptureDevice: A device that provides input (such as audio or video) for capture sessions and offers controls for hardware-specific capture features.

AVCaptureDeviceInput: A capture input that provides media from a capture device to a capture session.

AVCaptureVideoDataOutput: A capture output that records video and provides access to video frames for processing.

Configuring Camera

To set up the camera, we have to initialize our properties with some configurations which depend on what we need.

Let’s create our first method which is to set up the capture session.

Let me explain a little bit this code.

1- In the first two lines, we initialize our captureSession property and we set the sessionPreset (A constant value indicating the quality level or bit rate of the output), to HD. Obviously, we have more options to choose from. Find them here.

2- Then we have to find the available capture devices using the DiscoverySession class. In this one, we have to specify what kind of device we want to use (in this case our Wide Angle Camera, here we can find more options), what kind of media (video), and the camera position (to get a better result, the back camera).

3- We select the first coincidence and we assign it to our captureDevice property and then we call the next methods (we are going to create them in a moment).

Now we have the captureSession configured and our captureDevice selected, we need to create the captureDeviceInput.

1- To create the captureDeviceInput, we need to do it within a do catch statement and pass our captureDevice as a parameter.

2- Once created we ask if we can add it as an input in the captureSession and we add it.

3- In order to choose the desired FPS and resolution in an easy way, I recommend to use this extension.

4- We print the error if there is a problem with the captureDeviceInput initialization.

There is one more configuration missing, the output. Let’s create it!

1- With the first lines, we initialize the captureVideoOutput, we set as video settings the pixel format (in this case as 32BGRA), and we don’t discard the late video frames. Obviously, it will require more memory, but we don’t want to drop frames because we want to get a smoother user experience.

2- Same as the input, we ask if we can add the output and we add it.

3- We commit the changes.

4- We create our own queue to manage the sample buffer delegate.

At this point, we see an error like:

“Type of expression is ambiguous without more context”.

That’s because we have to add the AVCaptureVideoDataOutputSampleBufferDelegate and we have to implement two methods:

The didOutput method allows us to get the new video frame.

The didDrop method notifies when a video frame was discarded.

Putting the pieces together

So far we have the base created. Now it’s time to call the setupSession method.

We create an init method for our class, and we call the session there.

Also, we need to turn on the camera calling the startRunning method of our captureSession. Let’s create a public method for that and another one to stop the camera.

Now we can call those methods in our ViewController:

Processing video frames

If we run the code, nothing happens. That’s because we need to process the output first and here is when the magic happens.

In order to add delay to our camera feed, we need to record chunks of video, configure them, store them in a file using AVAssetWriter class and then play them in an AVPlayer after the delay that we want to add.

Let’s add some properties.

Now let’s create our writers and configure them.

This method receives as parameter a CMTime, which will be used as a starting point of each chunk of video.

1- First, we have to check if we already have a record stored and delete it to get space for the next one. We can’t store a lot of chunks of videos in our system because it will crash due to insufficient storage space.

2- We initialize our assetWriter with the chunkOutputURL and .mov as fileType. This property is in charge to writting the video in files.
If we want to share the video output through WIFI or Bluetooth, I recommend setting the shouldOptimizeForNetworkUse property to YES.

3- We initialize our assetWriterInput with the media type and some settings for the output. In this case, I set the codec, the resolution, and some keys related to the framerate and the bitrate (we can find the list of keys here).
This property is used to append media samples to a single track of an asset writer’s output file.
Also, it’s very recommended to set the expectsMediaDataInRealTime property to YES. This will ensure that isReadyForMoreMediaData is calculated properly for real-time usage.

4- We add one to our chunkNumber and we assign the CMTime to our chunkStartTime.

5- We start to write the chunks and we start the assetWriter session with the start time.

Now, let’s create another method to end each chunk of video and deliver them to the AVPlayer.

This method receives as a parameter a CMSampleBuffer which contains the video’s data.

1- First, we get the presentation timestamp that we need to create our next chunk.

2- We check if our assetWriter is nil, if it does, we call the createWriterInput method and we pass the time as a parameter.

3- If the assetWriter is not nil, we get the chunk duration, and if the duration is higher than the latency (remember to set that latency to whatever you want), we end the assetWriter session, we create new references to the URL, and the writer, in order to call the finishWriting method asynchronically and send the AVPlayerItem to our AVPlayer through a closure (that we are going to create).

4- If our assetWriterInput is ready for more media, we append the buffer.

Now we have both methods created to process the video frames, we need to call the createVideoOutput in the didOutput method from the AVCaptureVideoDataOutputSampleBufferDelegate.