This page documents basically everything in the Clip package, including helpers and other things intended for internal use. Much of it is probably not of interest to a general user. Thus, starting with the User guide is recommended before digging too deeply here.

API reference

class clip.add_subtitles(clip, *args)

Add one or more subtitles to a clip.

Parameters:

clip – The original clip.
args – Subtitles to add, each a (start_time, end_time, text) triple.

class clip.Align(value)

When stacking clips, how should each be placed?

For horizontal stacks, choose from:

const Align.TOP:

const Align.CENTER:

const Algin.BOTTOM:

For vertical stacks, choose from:

const Align.LEFT:

const Align.CENTER:

const Align.RIGHT:

Pass these to clip.hstack() or clip.vstack().

See example usage of Align in text_to_speech.py.

clip.alpha_blend(f0, f1)

Blend two equally-sized RGBA images, respecting the alpha channels of each.

Parameters:

f0 – An image.
f1 – Another image.

Returns:

The result of alpha-blending f0 onto f1.

Note that this process, as currently implemented, is irritatingly slow, mostly because of the need to convert the images from unit8 format to float64 format and back. Someday, we’ll replace this with something better.

clip.audio_samples_from_file(filename, cache, expected_sample_rate, expected_num_channels, expected_num_samples)

Extract audio data from a file, which may be either a pure audio format or a video file containing an audio stream.

Parameters:

filename – The name of the file to read.
cache – A ClipCache that might have the audio we want, or into which it can be stored.
expected_sample_rate – The sample rate we expect to see in the audio, in samples per second.
expected_num_channels – The number of channels we expect to see in the audio, usually either 1 or 2.
expected_num_samples – The number of audio samples we expect to see in each channel.

Raise an exception if the audio that turns up does not match the expected sample rate, number of channels, and approximate —that is, within about 1 second— number of samples.

class clip.AudioClip

Inherit from this for Clip classes that only really have audio, to default to simple black frames for the video. Then only get_samples() and get_subtitles() need to be defined.

frame_signature(t): A signature indicating a solid black frame.

request_frame(t): Does nothing.

get_frame(t): Return a solid black frame.

class clip.AudioMode(value)

When defining an element of a composite, how should the audio for this element be composited into the final clip?

Const AudioMode.REPLACE:: Overwrite the existing audio.
Const AudioMode.ADD:: Add the samples from this element to the existing audio samples.
Const AudioMode.IGNORE:: Discard the audio from this element.

Pass one of these to the constructor of Element.

clip.background(clip, bg_color)

Blend a clip onto a same-sized background of the given color.

Parameters:

clip – A clip to modify.
color – A color (r,g,b) or (r,g,b,a). Each element must be an integer in the range [0,255].

Returns:

A new clip, the same as the original, but with its video blended atop a solid background of the given bg_color.

See example usage of background in text_to_speech.py.

clip.bgr2rgb(clip)

Swap the first and third color channels. Useful if, instead of saving, you are sending the frames to something, like PIL, that expects RGB instead of BGR.

Parameters:: clip – A clip to modify.
Returns:: A new clip, the same as the original, but with its red and blue swapped.

clip.black(width, height, length)

A silent solid black clip.

Parameters:

width – The width of the clip in pixels. A positive integer.
height – The height of the clip in pixels. A positive integer.
length – The length of the clip in seconds. A positive float.

clip.chain(*args, length=None, fade_time=0)

Concatenate a series of clips. Optionally overlap them a little and fade between them.

Parameters:

args – The clips to concatenate, given as separate arguments or as lists.
length – The length of the resulting clip, in seconds. Use None to compute the natural length to show all of the given clips.
fade_time – The amount of time, in seconds, to overlap between successive clips, during which we’ll fade from one to the next.

clip.check_color(x, name)

Check whether x is an RGB color or an RGBA color. If it’s RGB, return the equivalent opaque RGBA color. If it’s RGBA, return it unchanged. If it’s neither, complain.

Use this to accept parameters that can be RGB or RGBA.

class clip.Clip

The base class for all clips. It defines a number of abstract methods that must be implemented in subclasses, along with a few helper methods that are available for all clips.

Represents a series of frames with a certain duration, each with identical height and width, along with audio of the same duration.

metrics: Each clip instance must have an attribute called metrics, an instance of Metrics, specifying the dimensions of the clip. This should be assigned to self.metrics in the constructor.

default_metrics: A class-level Metrics object, accessed as Clip.default_metrics, representing global default values to use when details are not otherwise specified. These can make code a little cleaner in a lot of places. For example, many silent clips will use the default sample rate for their dummy audio.

abstract frame_signature(t)

A string that uniquely describes the appearance of this clip at the given time.

Parameters:: t – A time, in seconds. Should be between 0 and self.length().

abstract request_frame(t)

Called during the rendering process, before any get_frame() calls, to indicate that a frame at the given time will be needed in the future.

Parameters:: t – A time, in seconds. Should be between 0 and self.length().

This is used to provide some advance notice to a clip that a get_frame() is coming later. Can help if frames are generated in batches, such as in from_file.

abstract get_frame(t): Create and return a frame of this clip at the given time.

abstract get_samples(): Create and return the audio data for the clip.

abstract get_subtitles(): Return an iterable of subtitles, each a (start_time, end_time, text) triple.

length(): Length of the clip, in seconds.

width(): Width of the video, in pixels.

height(): Height of the video, in pixels.

num_channels(): Number of channels in the clip, i.e. 1 for mono or 2 for stereo.

sample_rate(): Number of audio samples per second.

num_samples(): Number of audio samples in total.

readable_length(): A human-readable description of the length.

request_all_frames(frame_rate)

Submit a request for every frame in this clip.

Parameters:: frame_rate – The desired frame rate, in frames per second.

preview(frame_rate, cache_dir='/tmp/clipcache/computed')

Render the video part and display it in a window on screen.

Parameters:

frame_rate – The desired frame rate, in frames per second.
cache_dir – The directory to use for the frame cache.

verify(frame_rate, verbose=False)

Call the appropriate methods to fully realize this clip, checking that the right sizes and formats of images are returned by get_frame(), the right length of format of audio is returned by get_samples(), and the right kinds of subtitles are returned by get_subtitles().

Useful for debugging and testing.

Parameters:

frame_rate – The desired frame rate, in frames per second.
verbose – Set this to True to get lots of diagnostic output.

stage(directory, cache, frame_rate, filename='')

Get everything for this clip onto to disk in the specified directory:

For each frame, a symlink into a cache directory, named in numerical order.

FLAC file of the audio called audio.flac

Subtitles as an SRT file call subtitles.srt

Parameters:

directory – The directory in which to stage things.
cache – A ClipCache to use to get the frames, or to store the frames if they need to be generated.
frame_rate – Output frame rate in frames per second.
filename – An optional name for the file to which the staged frames will be saved. Used here only to make the progress bar more informative.

save_subtitles(destination)

Save the subtitles for this clip to the given file.

Parameters:: destination – A string filename or file-like object telling where to send the subtitles.

get_cached_filename(cache, t)

Make sure the frame is in the cache given, computing it if necessary, and return its filename.

Parameters:

cache – A ClipCache to retreive or store the frame.
t – A time, in seconds. Should be between 0 and self.length().

Returns:

The full path to a file containing the frame at time t.

compute_and_cache_frame(t, cache, cached_filename)

Call get_frame() to compute one frame, save it to a file, and note in the cache that this new file now exists.

Parameters:

t – A time, in seconds. Should be between 0 and self.length().
cache – A ClipCache to store the frame after computing it.
cached_filename – The full path, within the cache directory, where the frame should be saved.

get_frame_cached(cache, t)

Return a frame, from the cache if possible, computed from scratch if needed.

Parameters:

t – A time, in seconds. Should be between 0 and self.length().
cache – A ClipCache in which to look, and in which to store the frame if it needs to be computed.

is_silent(): Return True if the audio in this clip is all zero, or False otherwise.

class clip.ClipCache(directory, frame_format='png')

An object for managing the cache of files. This might contain already-computed frames, audio segments, and other things.

Parameters:

directory – The cache directory.
frame_format – The file extension to use for cached image frames. This is not used directly in this class, but maybe referenced by other code that uses the cache.

scan_directory(): Examine the cache directory and remember what we see there.

clear(): Delete all the files in the cache.

sig_to_fname(sig, ext, use_hash=True)

Compute the filename where something with the given signature and extension should live.

Parameters:

sig – A unique signature for the thing to be stored. This will be converted to a string and probably hashed to build the filename. Usually it will be the output of the frame_signature() of some Clip class.
ext – A string extension for the file.
use_hash – Should we hash the sig, or just stringify it? Use False to get files with more readable names.

Returns:

A full pathname within the the cache directory telling whether an object with the given signature should live.

lookup(sig, ext, use_hash=True)

Determine the appropriate filename for something with the given signature and extension. Also determine whether that file exists already.

Parameters:

sig – A unique signature for the thing to be stored. This will be converted to a string and possibly hashed to build the filename. Usually it will be the output of the frame_signature() of some Clip class.
ext – A string extension for the file.
use_hash – Should we hash the sig, or just stringify it? Use False to get files with more readable names.

Returns:

A tuple of the filename, along with True or False indicating whether that file exists or not.

insert(filename)

Update the cache to reflect the fact that the given file exists.

Parameters:: filename – The name of the file to insert.

class clip.composite(*args, width=None, height=None, length=None)

Combine a collection of clips into one big clip, positioning the constituent clips as directed across space and time.

Parameters:

args – Element objects, given as separate arguments or as lists. Each element describes a clip to use in the composite along with directions about when and where it should appear and how its audio and video should be integrated.
width – The width of the resulting composite.
height – The height of the resulting composite.
length – The length of the resulting composite.

If any of width, length, or height are omitted, it will be set automatically to be large enough for the given elements.

See example usage of composite in bounce.py.

class clip.crop(clip, lower_left, upper_right)

Crop the frames of a clip.

Parameters:

clip – A clip to modify.
lower_left – A point within the clip, given as a pair of integers (x,y).
upper_right – A point within the clip, given as a pair of integers (x,y).

Returns:

A new clip, the same as the original, but showing only the rectangle betwen lower_left and upper_right.

clip.custom_progressbar(task, steps)

A context manager that provides a progress bar.

Parameters:

task – A short string identifying the process whose progress is being shown.
steps – An integer or float, indicating the number of steps in the process.

When progress occurs, pass the current step number to the update() method of the returned object.

See example usage of custom_progressbar in auto_subtitle.py.

class clip.draw_text(text, font_filename, font_size, color, length, outline_width=None, outline_color=None)

A clip consisting of just a bit of text.

Parameters:

text – The string of text to draw.
font_filename – The filename of a TrueType font.
color – A color (r,g,b) or (r,g,b,a). Each element must be an integer in the range [0,255].
size – The desired size, in pixels.
length – The length of the clip in seconds. A positive float.
outline_width – The size of the desired outline, in pixels.
outline_color – The color of the desired outline, given as (r,g,b) or (r,g,b,a). Each element must be an integer in the range [0,255].

The resulting clip will be the right size to contain the desired text, which will be draw in the given color on a transparent background.

If either of outline_width or outline_color are given, both must be given.

See example usage of draw_text in bounce.py and text_to_speech.py.

class clip.Element(clip, start_time, position, video_mode=VideoMode.REPLACE, audio_mode=AudioMode.REPLACE)

An element to be included in a composite, including a specific clip to include in the composite and details about when, where, and how that clip should be included.

Parameters:

clip – The clip to composite.
start_time – The time within the composite, in seconds, when this clip should begin.
position – Where should this clip be positioned? See below.
video_mode – A VideoMode telling what to do with this clip’s video.
audio_mode – An AudioMode telling what to do with this clip’s audio.

The position, give either:

An integer tuple (x, y) indicating where the top left corner of this clip should be positioned.

A callable that accepts a double t and returns (x, y) for the top left corner at time t

Values for position can be negative, indicating the part of the clip is out of view, to the left or above.

For details about how to use this class, see composite.

See example usage of Element in bounce.py.

required_dimensions(): Return the (width, height) needed to show this element as fully as possible. Note that these dimensions may not show all of the clip, because things at negative coordinates will still be hidden.

signature(t)

A signature for this element, uniquely describing the visual contribution this element makes to the overall composite at time t. Analogous to the frame_signature() of a clip.

Returns None if this element does not contribute any visual change at the given time.

get_coordinates(t, shape)

Compute the coordinates at which this element should appear at the given time, based on this element’s position and the given shape.

Returns:: An integer tuple (left, right, top, bottom).

request_frame(t)

Note that the given clip will be displayed at the given time. Passes along the request to the clip itself.

Parameters:: t – A time, in seconds. Should be between 0 and self.length().

get_subtitles(): Return the subtitles of the constituent clip, shifted appropriately.

apply_to_frame(under, t)

Modify the given frame as described by this element.

Parameters:

under – An image frame, presumably from a composite being rendered.
t – A time, in seconds. Should be between 0 and self.length().

class clip.fade_base(clip, fade_length, transparent=False)

An abstract class to fade in from or out to silent black or silent transparency. Used by fade_in() and fade_out().

Parameters:

clip – The original clip.
fade_length – The amount of time the fade should last, in seconds.
transparent – Are we fading to/from transparent or black?

abstract alpha(t): At the given time, what scaling factor should we apply? An abstract method to allow subclasses to determine things like whether we are fading in or out, and whether it’s at the beginning or the end.

frame_signature(t): A signature determined by the the original clip and alpha at a given time.

get_frame(t): Actually perform the fading: Scale the cooresponding frame of the original clip by self.alpha(t).

clip.fade_between(clip1, clip2)

Fade from one clip to another. Both must have the same length.

Parameters:

clip1 – The first clip.
clip2 – The second clip.

class clip.fade_in(clip, fade_length, transparent=False)

Fade in from silent black or silent transparency.

Parameters:

clip – The original clip.
fade_length – The amount of time the fade should last, in seconds.
transparent – Are we fading from transparent or black?

class clip.fade_out(clip, fade_length, transparent=False)

Fade out to from silent black or silent transparency.

Parameters:

clip – The original clip.
fade_length – The amount of time the fade should last, in seconds.
transparent – Are we fading to transparent or black?

clip.ffmpeg(*args, task=None, num_frames=None, callback=None)

Run ffmpeg with the given arguments.

Parameters:

args – String arguments to pass to ffmpeg. These will have -y and -vstats_file added at the front, to allow overwriting and progress monitoring, respectively.
task – A short string identifying the work that’s being done. Shown in the progress bar. Use None to disable the progress bar.
num_frames – The number of video frames to be processed. Used to maintain the progress bar.

Raises FFMEGException if ffmpeg exits with an error code.

class clip.FFMPEGException: Raised when ffmpeg fails.

class clip.filter_frames(clip, func, name=None, size=None)

A clip formed by passing the frames of another clip through some function.

Parameters:

clip – The clip to filter.
func – The filter function.
name – A name for the filter.
size – The size of the output frames, or None.

For func, provide a callable that takes either one or two arguments.

If func takes one argument, the argument will be the frame itself.

If func takes two arguments, the arguments frame and its time index.

In either case, func should return the output frame.

The name is an optional string. If a name is given, it is included in the frame signatures. This can help with debugging.

Output frames may have a different size from the input ones, but must all be the same size across the whole clip. The size parameter specifies the size of the output frames. For size, use either None, a (width,height) tuple, or the string “same”.

Set size to None to infer the width and height of the result by executing the filter function on a sample frame. This can be slow if, for example, clip is (or relies upon) from_file clip or other time-intensive source.

Set size to a tuple of two positive integers (width, height) if you know them. This avoids generating a sample frame.

Set size to “same” to assume the size is the same as the source clip. This avoids generating a sample frame.

Audio remains unchanged from the original clip.

class clip.FiniteIndexed(num_frames, frame_rate=None, length=None)

Mixin for clips derived from a finite, ordered sequence of frames. Keeps track of a frame rate and a number of frames, and provides a method for converting times to frame indices.

Parameters:

num_frames – The positive integer number of frames.
frame_rate – The clip’s frame rate, in frames per second.
length – The clip’s length, in seconds.

Exactly one of num_frames and frame_rate should be given; the other should be None and will be computed to match.

time_to_frame_index(t)

Which frame would be visible at the given time?

Parameters:: t – A time, in seconds. Should be between 0 and self.length().
Returns:: The index of the frame should appear at time t.

clip.flatten_args(stuff)

Given a list of arguments, flatten one layer of lists and other iterables.

Parameters:: stuff – A list of any sort of stuff.
Returns:: A list containing a flattened version of the stuff. For anything iterable in the stuff, replace that iterable with the items it yields.

clip.format_seconds_as_hms(seconds)

Format a float number of seconds in the format that ffmpeg likes to see for subtitles.

Parameters:: seconds – A float number of seconds.
Returns:: The given time in 00:01:23,456 format.

clip.frame_times(clip_length, frame_rate)

Return the timestamps at which frames should occur for a clip of the given length at the given frame rate. Specifically, generate a timestamp at the midpoint of the time interval for each frame.

Parameters:

clip_length – The length of the clip, in seconds.
frame_rate – The desired frame rate, in frames per second.

Returns:

A generator that yields appropriate times for all of the frames in the given range at the given rate.

class clip.from_audio_samples(samples, sample_rate)

An audio clip formed formed from a given array of samples.

Parameters:

samples – The audio data. A numpy array with shape (num_samples, num_channels).
sample_rate – The sample rate in samples per second. A positive integer.

The number of channels is determined by the shape of the given samples array. The length of the clip is computed from the shape and the sample rate.

class clip.from_file(filename, suppress=None, cache_dir=None)

A clip read from a file such as an mp4, flac, or other format readable by ffmpeg.

Parameters:

filename – The source file to import.
supress – A list containing some (possibly empty) subset of “video”, “audio”, and “subtitle”. Streams of those types will be ignored.
cache_dir – The directory to use for the frame cache.

Details about what is included in the source file are extracted by parsing the output of ffprobe. We make a best effort to sort out the metrics of the available video and audio streams, but encoded video is complicated, so there are surely many variations that are not yet handled correctly. If you have a video for which this process fails, the maintainers would be interested to see it.

Video is read by asking ffmpeg to “explode” the video stream into individual images. This process takes some time, so the frames are cached for future runs. There can potentially be lots of images, so you’ll want to keep an eye on available disk space.

See example usage of from_file in auto_subtitle.py.

class clip.from_rosbag(pathname, topic)

Read images from given topic in a rosbag and treat them as a silent video.

Parameters:

pathname – The name of a ROS1 bag file or a ROS2 bag directory.
topic – The name of a topic in the bag file, of type sensor_msgs/msg/Image or sensor_msgs/msg/CompressedImage.

estimated_frame_rate(): Return an estimate of the native frame rate of this image sequence, based on the median time gap between successive frames.

class clip.from_zip(filename, frame_rate)

A video clip from images stored in a zip file.

Parameters:

filename – The name of the zip file to read.
frame_rate – The rate, in frames per second, at which the images in the zip file should be displayed.

If the zip file contains a file called audio.flac, that file will be used for the audio of the resulting clip. In this case, the video and audio parts must have approximately the same length.

The resulting clip will have no subtitles.

clip.get_duration_from_ffprobe_stream(stream, fmt)

Determine the duration of a stream found by ffprobe, using fmt if available as the container format output.

Parameters:

stream – A dictionary built from the key-value pairs in an ffprobe stream.
fmt – A dictionary built from the key-value pairs in an ffprobe format.

Returns:

A float number of seconds of duration for that stream.

Raises ValueError if no duration is found.

clip.get_font(font, size)

Return a TrueType font for use on Pillow images.

Parameters:

font – The filename of the desired font.
size – The desired size, in pixels.

Returns:

A Pillow ImageFont.

This differs from calling ImageFont.truetype() directly only by caching to prevent loading the same font again and again. The performance improvement from this caching seems to be small but non-zero.

clip.get_framerate_from_ffprobe_stream(stream)

Determine the frame rate for a video stream found by ffprobe.

Parameters:: stream – A dictionary built from the key-value pairs in an ffprobe stream.
Returns:: A float number of frames per second for that stream.

Raises ValueError if no frame rate is found.

clip.get_requested_intervals(requested_indices, max_gap): For a given set of requested indices, return an iterable of start/stop pairs that covers everything that was requested. If there are gaps smaller than max_gap, include those missing ones too.

clip.hold_at_end(clip, target_length) → Clip

Extend a clip by to fill a target length by repeating its final frame.

Parameters:

clip – The original clip.
target_length – The desired length, in seconds.

If target_length is greater than clip.length(), the result will be truncated to target_length.

clip.hold_at_start(clip, target_length) → Clip

Extend a clip by to fill a target length by repeating its first frame.

Parameters:

clip – The original clip.
target_length – The desired length, in seconds.

If target_length is greater than clip.length(), the result will be truncated to target_length.

clip.hstack(*args, align=Align.CENTER, min_height=0)

Arrange a series of clips in a horizontal row.

Parameters:

args – The clips to stack, given as a list or as separate arguments. As a bonus, if any of the args is an integer instead of a clip, padding of that amount will be inserted.
align – How should the clips be aligned if they have different widths? See Align.
min_height – A minimum height for the result, in pixels.

class clip.image_glob(pattern, frame_rate=None, length=None)

Video from a collection of identically-sized image files that match a unix-style pattern, at a given frame rate or timed to a given length.

Parameters:

pattern – A wildcard pattern, of the form used by the standard library function glob.glob, identifying a set of image files. These will be the frames of the clip, in sorted order.
frame_rate – The clip’s frame rate, in frames per second.
length – The clip’s length, in seconds.

Exactly one of num_frames and frame_rate should be given; the other should be None and will be computed to match.

clip.is_bool(x): Is the given value a boolean?

clip.is_even(x): Is the given value an even number?

clip.is_float(x)

Can the given value be interpreted as a float?

Returns True for floats, integers, and other things for which float(x) succeeds, like strings containing nubmers.

clip.is_int(x)

Is the given value an integer?

Returns True only for actual integers. Notably, this rejects floats, so that if rounding or truncating is going to happen, the user should do it explicitly and therefore be aware of it.

clip.is_int_point(pt): Is this a 2D point with integer coordinates?

clip.is_iterable(x): Is this a thing that can be iterated?

clip.is_non_negative(x): Can the given value be interpreted as a non-negative number?

clip.is_positive(x): Is the given value a positive number?

clip.is_rgb_color(color): Is this a color, in RGB unsigned 8-bit format?

clip.is_rgba_color(color): Is this a color, in RGBA unsigned 8-bit format?

clip.is_string(x): Is the given value a string?

clip.join(video_clip, audio_clip)

A new clip that combines the video of one clip with the audio of another.

Parameters:

video_clip – A clip whose video you care about.
audio_clip – A clip whose audio you care about.

Returns:

A clip with the video from video_clip and the audio from audio_clip.

The length of the result will be the length of the longer of the two inputs.

If video_clip is longer than audio_clip, the result will be padded with silence at the end.

If audio_clip is longer than video_clip, the result will be padded with black frames that the end.

See example usage of join in text_to_speech.py.

class clip.ken_burns(clip, width, height, start_top_left, start_bottom_right, end_top_left, end_bottom_right)

Pan and/or zoom through a clip over time.

Crops and scales each frame of the input clip, smoothly moving the visible portion from a starting rectangle to an ending rectangle across the full duration of the input clip.

Parameters:

clip – A clip to modify.
width – The desired output width. A positive integer.
height – The desired output height. A positive integer.
start_top_left – Integer coordinates of the top-left corner of the visible rectangle at the start, as an (x,y) tuple.
start_bottom_right – Integer coordinates of the bottom-right corner of the visible rectangle at the start, as an (x,y) tuple.
end_top_left – Integer coordinates of the top-left corner of the visible rectangle at the end, as an (x,y) tuple.
end_bottom_right – Integer coordinates of the bottom-right corner of the visible rectangle at the end, as an (x,y) tuple.

To prevent distortion, all three of these rectangles must have the same aspect ratio:

The output clip, given by width and height.

The visible rectangle at the start.

The visible rectangle at the end.

An exception is raised if these three are not at least approximately equal.

clip.letterbox(clip, width, height)

Fix the clip within given dimensions, adding black bands on the top/bottom or left/right if needed.

Parameters:

clip – A clip to modify.
width – The desired width. A positive integer.
height – The desired height. A positive integer.

clip.loop(clip, length)

Repeat a clip as needed to fill the given length.

Parameters:

clip – A clip to modify.
length – The target length, in seconds.

class clip.Metrics(src=None, width=None, height=None, sample_rate=None, num_channels=None, length=None)

A object describing the dimensions of a Clip.

Parameters:

src – Another Metrics object, from which to draw defaults when other parameters below are omitted. One reasonable choice is Clip.default_metrics. Another is to use the metrics of another clip. If this is None, then all of the other parameters must be given.
width – The width of the clip, in pixels. A positive integer.
height – The height of the clip, in pixels. A positive integer.
sample_rate – The sample rate for the audio of the clip, in samples per second. A positive integer.
num_channels – The number of audio channels. A positive integer, usually 1 or 2.
length – The length of the clip, in seconds. A positive float.

See example usage of Metrics in bounce.py.

verify(): Make sure we have valid metrics. If not, raise either TypeError or ValueError depending on what’s wrong.

verify_compatible_with(other, check_video=True, check_audio=True, check_length=False)

Make sure two Metrics objects match each other. Raise an exception if not.

Parameters:

other – Another Metrics object to compare to this one.
check_video – Set this to False to ignore differences in the frame sizes.
check_audio – Set this to False to ignore differences in audio sample rate and number of channels.
check_length – Set this to False to ignore differences in the clip lengths

num_samples()

Returns:: The length of the clip, in audio samples.

readable_length()

Returns:: A human-readable description of the length.

clip.metrics_and_frame_rate_from_stream_dicts(streams, filename)

Given a dict containing the audio, video, and subtitles streams of a clip, return the appropriate Metrics object, the float frame rate, and booleans telling whether video, audio, and subtitles exist.

Parameters:

streams – A dictionary with “audio”, or “subtitle”. Each value a dictionary built from the key-value pairs in an ffprobe
filename – The name of the file described by streams. Used only for error messages.

clip.metrics_from_ffprobe_output(ffprobe_output, filename, suppress=None)

Sift through output from ffprobe and trying to make a Metrics from it.

Parameters:

ffprobe_output – A string containing output from ffprobe.
supress – A list containing some (possibly empty) subset of “video”, “audio”, and “subtitle”. Streams of those types will be ignored.

Returns:

A Metrics object based on that data, or raise an exception if something strange is in there.

The output should specifically be from:

ffprobe -of compact -show_entries stream

Used indirectly by from_file.

class clip.mono_to_stereo(clip)

Change the number of channels from one to two.

Parameters:: clip – A clip to modify, having exactly one audio channel.
Returns:: A new clip, the same as the original, but with the audio channel duplicated.

class clip.MutatorClip(clip)

Inherit from this for Clip classes that modify another clip. Override only the parts that need to change.

frame_signature(t): By default, re-use the signatures of the original clip.

request_frame(t): By default, pass the request unchanged to the original clip.

get_frame(t): By default, re-use the frames of the original clip.

get_subtitles(): By default, re-use the subtitles of the original clip.

get_samples(): By default, re-use the audio of the original clip.

clip.parse_hms_to_seconds(hms)

Parse a string in the format that ffmpeg uses for subtitles into a float number of seconds.

Parameters:: hms – The given time in 00:01:23,456 format.
Returns:: A float number of seconds for the input string.

clip.parse_subtitles(srt_text, subtitles_filename=None)

Parse a string of SRT subtitles into the form used in this library.

Parameters:

srt_text – A string containing subtitle text in SRT format.
subtitles_filename – An optional filename to include if an exception must be raised.

Returns:

A generator that yields subtitles, each a (start_time, end_time, text) triple.

clip.patch_audio_length(data, num_samples)

Given a numpy array of audio samples, return an array that has exactly the requested length. Adds zeros or truncates, as needed.

Parameters:

data – A numpy array representing audio data.
num_samples – The desired number of samples.

clip.pdf_page(filename, page_num, length, **kwargs)

A silent video constructed from a single page of a PDF.

Parameters:

filename – The name of a PDF file.
pdf_num – The page number within the PDF to extract, numbered starting from 1.
length – The desired clip length, in seconds.
kwargs – Keyword arguments to pass along to pdf2image.

For kwargs, see the docs for the pdf2image package. Of particular interest there is size=(width, height) to get an image of a desired size.

clip.read_image(filename)

Read an image from disk. If needed, convert it to the correct RGBA uint8 format.

Parameters:: filename – The name of the file to read.
Returns:: The image data from that file, in RGBA uint8 format.

class clip.repeat_frame(clip, when, length)

Show the same frame, from another clip, over and over.

Parameters:

clip – A clip from which to borrow a frame.
when – A time, in seconds. Should be between 0 and clip.length().
length – The length of the clip in seconds. A positive float.

clip.require(x, func, condition, name, exception_class)

Check a condition and raise an exception if it fails.

Parameters:

x – A value of some kind.
func – A callable.
condition – A human-readable string name for the condition checked by func.
name – A human-readable string name for x.
exception_class – What class of Exception shall we raise if func(x) is not True?

Call func(x) and check whether it returns a true value. If not, raise an exception of the given class.

clip.require_bool(x, name): Raise an informative exception is x is not either True or False.

clip.require_callable(x, name): Raise an informative exception if x is not callable.

clip.require_clip(x, name): Raise an informative exception if x is not a Clip.

clip.require_equal(x, y, name): Raise an informative exception if x and y are not equal.

clip.require_even(x, name): Raise an informative exception if x is not even.

clip.require_float(x, name): Raise an informative exception if x is not a float.

clip.require_int(x, name): Raise an informative exception if x is not an integer.

clip.require_int_point(x, name): Raise an informative exception if x is not a integer point.

clip.require_iterable(x, name): Raise an informative exception if x is not iterable.

clip.require_less(x, y, name1, name2): Raise an informative exception if x is greater than y.

clip.require_less_equal(x, y, name1, name2): Raise an informative exception if x is not less than or equal to y.

clip.require_non_negative(x, name): Raise an informative exception if x is not 0 or positive.

clip.require_positive(x, name): Raise an informative exception if x is not positive.

clip.require_rgb_color(x, name): Raise an informative exception if x is not an RGB color.

clip.require_string(x, name): Raise an informative exception if x is not a string.

class clip.resample(clip, sample_rate=None, length=None)

Change the sample rate and/or length.

Parameters:

clip – A clip to modify.
sample_rate – The desired sample rate.
length – The desired length.

Use None for sample_rate or length to leave that part unchanged.

class clip.reverse(clip)

Reverse both the video and audio in a clip.

Parameters:: clip – A clip to modify.

class clip.ROSImageMessage(reader, tup)

An image message, read from a rosbag. Used by from_rosbag.

Parameters:

reader – A Reader object from the rosbags package.
tup – A tuple (connection, topic, rawdata) supplied by reader.

image(): Decode, possibly decompress, and return the image encoded in this message.

clip.save_audio(clip, filename)

Save the audio part of a clip to an audio format.

Parameters:

clip – The clip to save.
filename – A file name to write the audio to.

The file format is determined by the extension of the given filename. The list of supported formats is determined by what is supported by the libsndfile library, but the most common format like WAV and FLAC are likely to work.

clip.save_gif(clip, filename, frame_rate, cache_dir='/tmp/clipcache/computed', burn_subtitles=False)

Save a clip to an animated GIF.

Parameters:

clip – The clip to save.
filename – A file name to write to.
cache_dir – The directory to use for the frame cache.
burn_subtitles – Should the frames be modified to include the subtitle text?

clip.save_mp4(clip, filename, frame_rate, bitrate=None, target_size=None, two_pass=None, preset='slow', cache_dir='/tmp/clipcache/computed', burn_subtitles=False)

Save a clip to an MP4 file.

Parameters:

clip – The clip to save.
filename – A file name to write to.
frame_rate – Output frame rate in frames per second.
bitrate – The target bitrate in bits per second.
target_size – The target filesize in megabytes.
preset – A string that controls how quickly ffmpeg encodes. See below.
two_pass – Should the ffmpeg encoding run twice or just once?
cache_dir – The directory to use for the frame cache.
burn_subtitles – Should the frames be modified to include the subtitle text?

At most one of bitrate and target_size should be given.

If a bitrate is given, it will be passed along to ffmpeg as a target.

If a target_size is given, we compute the appropriate bitrate to attempt to get close to that target.

If both are omitted, the default is to target a bitrate of 1024k.

For preset, choose from:

“ultrafast”

“superfast”

“veryfast”

“faster”

“fast”

“medium”

“slow”

“slower”

“veryslow”

The ffmpeg documentation for these says to “use the slowest preset you have patience for.” Default is slow.

Using two_pass makes things slower because the encoding process happens twice, but can improve the results, particularly when using target_size. Default is to use two_pass only when a target_size is given.

See example usage of save_mp4 in auto_subtitle.py, bounce.py, and text_to_speech.py.

clip.save_play_quit(clip, frame_rate, filename='spq.mp4'): Save the video as an MP4, play it, and then end the process. Useful sometimes when debugging, to see a particular clip without running the entire program.

clip.save_rosbag(clip, pathname, frame_rate, compressed=True, topic=None, frame_id='/camera', fmt=None)

Save the video portion of a clip as a ROS2 rosbag.

Parameters:

pathname – The name of the directory to write to.
frame_rate – The desired frame rate.
compressed – Boolean telling whether the images should be compressed.
topic – The topic name.
frame_id – The frame_id to use in the message headers.
fmt – The format or encoding to use for the images.

If compressed is True:

The topic will default to /camera/image_raw/compressed.
The topic’s data type will be sensor_msgs/CompressedImage.
The format parameter will refer to the encoding of the images, defaulting to rgb8.

If compressed is False:

The topic will default to /camera/image_raw.
The topic’s data type will be sensor_msgs/CompressedImage.
The format parameter will refer to the compression format, defaulting to rgb8; jpeg compressed bgr8.

clip.save_via_ffmpeg(clip, filename, frame_rate, output_args, use_audio, use_subtitles, cache_dir, two_pass)

Use ffmpeg to save a clip with the given arguments describing the desired output.

Parameters:

clip – The clip to save.
filename – A file name to write the video to.
frame_rate – The frame rate to use for the final output.
output_args – A list of string arguments to pass to ffmpeg. These will appear after arguments setting up the inputs.
use_audio – Should the audio be included in the ffmpeg input?
use_subtitles – Should the subtitles be included in the ffmpeg input?
two_pass – Should ffmpeg run twice or just once?

clip.save_zip(clip, filename, frame_rate, include_audio=True, include_subtitles=None)

Save a clip to a zip archive of numbered images.

Parameters:

clip – The clip to save.
filename – A file name to write to.
frame_rate – Output frame rate in frames per second.
include_audio – Should the audio be included?
include_subtitles – Should the subtitles be included? Use None to include a subtitles file only if there are more than zero subtitles in the clip.

class clip.scale_alpha(clip, factor)

Scale the alpha channel of a given clip by the given factor.

Parameters:

clip – The clip to modify.
factor – A positive float by which to scale the alpha channel of the given clip, or a callable that accepts a time and returns the scaling factor to use at that time.

See example usage of scale_alpha in bounce.py.

clip.scale_by_factor(clip, factor)

Scale size of the frames of a clip by a given factor.

Parameters:

clip – The clip to modify.
factor – A positive float scaling factor.

clip.scale_to_fit(clip, max_width, max_height)

Scale the frames of a clip to fit within the given constraints, maintaining the aspect ratio.

Parameters:

clip – The clip to modify.
max_width – The maximum width of the result. A positive integer.
max_height – The maximum height of the result. A positive integer.

clip.scale_to_size(clip, width, height)

Scale the frames of a clip to a given size, possibly distorting them.

Parameters:

clip – The clip to modify.
max_width – The width of the result. A positive integer.
max_height – The height of the result. A positive integer.

class clip.scale_volume(clip, factor)

Scale the volume of audio in a clip.

Parameters:

clip – A clip to modify.
factor – A float.

Returns:

A new clip, the same as the original, but with each of its audio sample multiplied by factor.

clip.sha256sum_file(filename)

Hash the contents of a file.

Parameters:: filename – The name of a file to read.
Returns:: A short hexadecmial hash of the contents of a file.

This implementation uses sha256.

class clip.silence_audio(clip)

Replace whatever audio we have with silence.

Parameters:: clip – A clip to modify.
Returns:: A new clip, the same as the original, but with silent audio.

The sample rate and number of channels remain unchanged.

class clip.sine_wave(frequency, volume, length, sample_rate, num_channels)

A sine wave with the given frequency.

Parameters:

frequency – The desired frequency, in hertz.
volume – The desired volume, between 0 and 1.
length – The desired length, in seconds.
num_channels – The number of channels, usually 1 or 2.

class clip.slice_clip(clip, start=0, end=None)

Extract the portion of a clip between the given times.

Parameters:

clip – A clip to modify.
start – A nonnegative starting time. Defaults to the begninng of the clip.
start – An ending time, at most clip.length(). Use None for the end of the clip.

See example usage of slice_clip in auto_subtitle.py.

clip.slice_out(clip, start, end)

Remove the part between the given endponts.

Parameters:

clip – The clip to modify.
start – A non-negative float starting time.
end – A non-negative float ending time.

Returns:

The original clip, but missing the portion between the two given times.

class clip.solid(color, width, height, length)

A silent video clip in which each frame has the same solid color.

Parameters:

bg_color – A color (r,g,b) or (r,g,b,a). Each element must be an integer in the range [0,255].
width – The width of the clip in pixels. A positive integer.
height – The height of the clip in pixels. A positive integer.
length – The length of the clip in seconds. A positive float.

class clip.spin(clip, total_rotations)

Rotate the contents of a clip about its center.

Parameters:

clip – The clip to modify.
total_rotations – A positive float indicating how many total rotations to make throughout the original clip’s duration.

The resulting clip will be a square, large enough to show all of the original clip at every point of its rotation.

The angular velocity is computed so that the result will complete the requested rotations within the length of the original clip.

clip.stack_clips(*args, align, min_dim=0, vert, name)

Arrange a series of clips in a stack, either vertically or horizontally. Probably use vstack() or hstack() to call this.

Parameters:

args – The clips to stack, given as a list or as separate arguments. As a bonus, if any of the args is an integer instead of a clip, padding of that amount will be inserted.
align – How should the clips be aligned? See Align.
min_dim – A minimum height (if we are stacking horizonally) or width (if we are stacking vertically.)
vert – True if we stacking vertically; False if we are stacking horizontally.
name – A name to use in error messages. Probably “vstack” or “hstack”.

class clip.static_frame(the_frame, frame_name, length)

Show a single image over and over, silently.

Parameters:

the_frame – The image to display.
frame_name – A unique name for the frame, to be used in frame signatures.
length – The length of the clip in seconds. A positive float.

This is for cases where you have an image already in memory. If you want to load an image file, see static_image(). If the frame you want to repeat is part of another clip, see repeat_frame. See also pdf_page.

frame_signature(t)

A string that uniquely describes the appearance of this clip at the given time.

Parameters:: t – A time, in seconds. Should be between 0 and self.length().

request_frame(t)

Called during the rendering process, before any get_frame() calls, to indicate that a frame at the given time will be needed in the future.

Parameters:: t – A time, in seconds. Should be between 0 and self.length().

This is used to provide some advance notice to a clip that a get_frame() is coming later. Can help if frames are generated in batches, such as in from_file.

get_frame(t): Create and return a frame of this clip at the given time.

get_subtitles(): Return an iterable of subtitles, each a (start_time, end_time, text) triple.

clip.static_image(filename, length)

Show a single image loaded from a file over and over, silently.

Parameters:

filename – The name of the file to read.
length – The length of the clip in seconds. A positive float.

class clip.stereo_to_mono(clip)

Change the number of channels from two to one.

Parameters:: clip – A clip to modify, having exactly two audio channels.
Returns:: A new clip, the same as the original, but with the two audio channels averaged into just one.

See example usage of stereo_to_mono in auto_subtitle.py.

clip.subtitles_from_file(filename, cache)

Extract subtitles from a file.

Parameters:

filename – The name of a media file that includes a subtitle stream.
cache – A ClipCache that might have the subtitle stream we want, or into which it can be stored.

Returns:

A generator that yields subtitles, each a (start_time, end_time, text) triple.

clip.superimpose_center(under_clip, over_clip, start_time, video_mode=VideoMode.REPLACE, audio_mode=AudioMode.ADD)

Superimpose one clip on another, in the center of each frame, starting at a given time.

Parameters:

under_clip – The main clip.
over_clip – Another clip to show atop the main clip.
start_time – The time at which over_clip should begin. A non-negative float.
audio_clip – A AudioMode telling what to do with the audio in over_clip.

clip.temporarily_changed_directory(directory)

A context in which the current directory has been changed to the given one, which should exist already.

When the context ends, change the current directory back.

clip.temporary_current_directory()

A context in which the current directory is a new temporary directory.

When the context begins, a new temporary directory is created. This new directory becomes the current directory.

When the context ends, the current directory is restored and the temporary directory is vaporized.

clip.timewarp(clip, factor)

Speed up a clip by the given factor.

Parameters:

clip – A clip to modify.
factor – A float factor by while to scale the clip’s speed.

clip.to_default_metrics(clip)

Adjust a clip so that its metrics match the default metrics: Letterbox video and resample to match frame rate and sample rate. Useful if assorted clips from various sources will be chained together.

Parameters:: clip – A clip to modify.

clip.to_monochrome(clip)

Convert a clip’s video to monochrome.

Parameters:: clip – A clip to modify.
Returns:: A new clip, the same as the original, but with its video converted to monochrome.

class clip.VideoClip

Inherit from this for Clip classes that really only have video, to default to silent audio.

get_samples(): Return audio samples appropriate to use as a default audio. That is, silence with the appropriate metrics.

get_subtitles(): Return an iterable of subtitles, each a (start_time, end_time, text) triple.

class clip.VideoMode(value)

When defining an element of a composite, how should the pixels from this element be combined with any existing pixels that it covers, to form the final clip?

Const VideoMode.REPLACE:: Overwrite the existing pixels.
Const VideoMode.BLEND:: Use the alpha channel to blend pixels from this element into the existing pixels.
Const VideoMode.ADD:: Add the pixel values from this element and the existing pixel values.
Const VideoMode.IGNORE:: Discard the video from this element.

Pass one of these to the constructor of Element.

See example usage of VideoMode in bounce.py.

clip.vstack(*args, align=Align.CENTER, min_width=0)

Arrange a series of clips in a vertical column.

Parameters:

args – The clips to stack, given as a list or as separate arguments. As a bonus, if any of the args is an integer instead of a clip, padding of that amount will be inserted.
align – How should the clips be aligned if they have different widths? See Align.
min_width – A minimum width for the result, in pixels.

See example usage of vstack in bounce.py and text_to_speech.py.

clip.white(width, height, length)

A silent solid white clip.

Parameters:

width – The width of the clip in pixels. A positive integer.
height – The height of the clip in pixels. A positive integer.
length – The length of the clip in seconds. A positive float.