Usage#

Import the library:

import xarray_video as xv

This will also automatically import and register the video accessors and the zarr codecs for video data

Open a video file#

If you don’t have file to test, grab one of the xarray-video test files:

import urllib.request
urllib.request.urlretrieve("https://github.com/oceanum-io/xarray-video/raw/main/tests/data/ocean_test.mp4", "ocean_test.mp4")

v=xv.open_video("ocean_test.mp4")
print(v)

Create a time coordinate by defining the start time of the video:

v=xv.open_video("ocean_test.mp4", start_time="2022-01-01 00:00:00")
print(v)

Have look at the last frame of the video:

v["video"][-1].video.plot()

Play a preview:

v["video"].video.play()

If you have problems with the matplotlib graphics backend, try using ipython. In a notebook environment, you may have to do %matplotlib widget to get the play function to work correctly.

Subset a video dataset#

Subset the first 100 frames of the video:

v_subset=v.isel(frame=slice(0,100))

get only once every 25 frames:

v_subset=v.isel(frame=slice(None,None,25))

or if you want to subset by a time coordinate:

v_subset=v.sel(time=slice("2022-01-01 00:00:00", "2022-01-01 00:00:10"))

Get a part of the video frame:

v_subset=v.isel(pixel_x=slice(0,500))
v_subset["video"].video.play()

Write back to a file#

Write to video file:

v_subset.video.to_video("onlyhalf.mp4")

Write to netcdf:

v_subset.to_netcdf("my_video.nc")

Write to zarr using the video codec (note the video accessor):

v_subset.video.to_zarr("video_codec.zarr")
!du -h video_codec.zarr

Or use vanilla zarr, if you want:

v_subset.to_zarr("video_blosc.zarr")
!du -h video_blosc.zarr

Open the zarr archive with video codec to check xarray can open it:

import xarray as xr
vz=xr.open_dataset("video_h264.zarr")
vz["video"].video.play()

Test compression performance#

Write the complete dataset to zarr using vanilla blosc and the video codec:

t0=time.time()
v.to_zarr("test_blosc.zarr", mode="w")
t1=time.time()
t_blosc=t1-t0

t0=time.time()
v.video.to_zarr("test_h264.zarr", mode="w")
t1=time.time()
t_h264=t1-t0

Get sizes of the zarr archives:

from pathlib import Path

h264_directory = Path('video_h264.zarr')
h264_size=sum(f.stat().st_size for f in h264_directory.glob('**/*') if f.is_file())
blosc_directory = Path('video_blosc.zarr')
blosc_size=sum(f.stat().st_size for f in blosc_directory.glob('**/*') if f.is_file())

compression_ratio=(blosc_size/h264_size)

time_ratio=(t_blosc/t_h264)

We can see that for video data the video codec compression is more than 300x better than blosc. But it is also about a factor of 2 slower.

Just for fun ;), calculate the Weissman score for the complete video:

import numpy as np

weissman=compression_ratio*np.log(t_blosc)/np.log(t_264)

Weissman score of almost 250!!! Richard Hendricks would be pretty happy with that.