Usage#
Import the library:
import xarray_video as xv
This will also automatically import and register the video accessors and the zarr codecs for video data
Open a video file#
If you don’t have file to test, grab one of the xarray-video test files:
import urllib.request
urllib.request.urlretrieve("https://github.com/oceanum-io/xarray-video/raw/main/tests/data/ocean_test.mp4", "ocean_test.mp4")
v=xv.open_video("ocean_test.mp4")
print(v)
Create a time coordinate by defining the start time of the video:
v=xv.open_video("ocean_test.mp4", start_time="2022-01-01 00:00:00")
print(v)
Have look at the last frame of the video:
v["video"][-1].video.plot()
Play a preview:
v["video"].video.play()
If you have problems with the matplotlib graphics backend, try using ipython. In a notebook environment, you may have to do %matplotlib widget to get the play function to work correctly.
Subset a video dataset#
Subset the first 100 frames of the video:
v_subset=v.isel(frame=slice(0,100))
get only once every 25 frames:
v_subset=v.isel(frame=slice(None,None,25))
or if you want to subset by a time coordinate:
v_subset=v.sel(time=slice("2022-01-01 00:00:00", "2022-01-01 00:00:10"))
Get a part of the video frame:
v_subset=v.isel(pixel_x=slice(0,500))
v_subset["video"].video.play()
Write back to a file#
Write to video file:
v_subset.video.to_video("onlyhalf.mp4")
Write to netcdf:
v_subset.to_netcdf("my_video.nc")
Write to zarr using the video codec (note the video accessor):
v_subset.video.to_zarr("video_codec.zarr")
!du -h video_codec.zarr
Or use vanilla zarr, if you want:
v_subset.to_zarr("video_blosc.zarr")
!du -h video_blosc.zarr
Open the zarr archive with video codec to check xarray can open it:
import xarray as xr
vz=xr.open_dataset("video_h264.zarr")
vz["video"].video.play()
Test compression performance#
Write the complete dataset to zarr using vanilla blosc and the video codec:
t0=time.time()
v.to_zarr("test_blosc.zarr", mode="w")
t1=time.time()
t_blosc=t1-t0
t0=time.time()
v.video.to_zarr("test_h264.zarr", mode="w")
t1=time.time()
t_h264=t1-t0
Get sizes of the zarr archives:
from pathlib import Path
h264_directory = Path('video_h264.zarr')
h264_size=sum(f.stat().st_size for f in h264_directory.glob('**/*') if f.is_file())
blosc_directory = Path('video_blosc.zarr')
blosc_size=sum(f.stat().st_size for f in blosc_directory.glob('**/*') if f.is_file())
compression_ratio=(blosc_size/h264_size)
time_ratio=(t_blosc/t_h264)
We can see that for video data the video codec compression is more than 300x better than blosc. But it is also about a factor of 2 slower.
Just for fun ;), calculate the Weissman score for the complete video:
import numpy as np
weissman=compression_ratio*np.log(t_blosc)/np.log(t_264)
Weissman score of almost 250!!! Richard Hendricks would be pretty happy with that.