Port Cube

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Two things.

1. Where are you loading avsresize.dll?

2. Can you give me the cube file?

It's working fine for me with this script, only differing from yours in the cube file (which must be size 65):

loadplugin("d:\don\Programming\C++\dgdecnv\DGDecodeNV\x64\Release\dgdecodenv.dll")
loadplugin("avsresize.dll")
loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
dgsource("THE GREAT WALL.dgi")
#From 4:2:2 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("PQ_to_BT709_slope.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV422 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV422P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Mon Aug 08, 2022 7:15 am
1. Where are you loading avsresize.dll?
No need as it's in AVS+ default folders.
Rocky wrote:
Mon Aug 08, 2022 7:15 am
2. Can you give me the cube file?
I can give you a identity one, that crashes too:
IDENTITY.7z
(10.44 KiB) Downloaded 252 times
Rocky wrote:
Mon Aug 08, 2022 7:15 am
which must be size 65
Are you meaning kB or what? Both BBC and Warner Bros licensed cube files are around 1 MB.

I asked a friend of mine on Quadro + Xeon workstation to try it too and he can confirm that:

1) 709 to HLG with commercial(*) cube works
2) PQ to HLG with commercial(*) cube crashes
3) HLG to PLQ with commercial(*) cube crashes

(*) both BBC and Warner Bros

AVSCube can ingest the various cubes with no issues.
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

The cube dimension must be 65. Checking...
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Your cube is size 33. I'll generalize the size and give a new test version. After DGDecNV 444.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Mon Aug 08, 2022 9:09 am
Your cube is size 33.
Is the 65 size related to CUDA only? AVSCube works ok.
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

DGCube (CUDA) was hardwired for 65. Cube (CPU) could handle any size.

Re-download DGCube to get the fix, i.e., ability to open any size cube file.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Mon Aug 08, 2022 11:57 am
Re-download DGCube to get the fix, i.e., ability to open any size cube file.
Rapidly tested and it seems to be working.

I will bench and post results.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Preliminary benchmarks before encoding, using AVSMeter64 + GPU-Z.

AVSCube script:

LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
#From 4:2:0 16bit planar Narrow Range to RGB Planar 16bit Full Range
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
#From PQ to HLG with 16bit precision
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
#From RGB 16bit planar Full Range to YUV420 10bit planar Narrow Range with dithering
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Frame width: 3840
Frame height: 2064
Framerate: 23.976 (24000/1001)
Colorspace: YUV420P10

FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%

GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W

Same with Prefetch(8):

FPS (cur | min | max | avg): 2.859 | 0.910 | 108696 | 11.83
Process memory usage: 2014 MiB
Thread count: 22
CPU usage (current | average): 65.8% | 71.4%

GPU usage (current | average): 8% | 14%
VPU usage (current | average): 11% | 21%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 16.5 W | 38.3 W

DGCube script:

LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\DGCube.dll")
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0)
propClearAll()
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

FPS (cur | min | max | avg): 4.617 | 1.325 | 4.669 | 4.165
Process memory usage: 687 MiB
Thread count: 11
CPU usage (current | average): 6.7% | 6.8%

GPU usage (current | average): 2% | 9%
VPU usage (current | average): 3% | 11%
GPU memory usage: 1143 MiB
GPU Power Consumption (cur | avg): 37.6 W | 25.4 W

Same with Prefetch(8):

FPS (cur | min | max | avg): 1.433 | 0.567 | 175439 | 10.15
Process memory usage: 2702 MiB
Thread count: 27
CPU usage (current | average): 64.3% | 62.6%

GPU usage (current | average): 37% | 28%
VPU usage (current | average): 28% | 10%
GPU memory usage: 2858 MiB
GPU Power Consumption (cur | avg): 50.1 W | 47.1 W
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Real world scenario: 4k PQ video to 1080p HLG video with denoise and x265 encoding.

Script:

SetMemoryMax()
SetFilterMTMode("DEFAULT_MT_MODE", 2)
LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
LoadPlugin("D:\Eseguibili\Media\AVSCube\VSCube.dll") # or DGCube
DGSource("F:\In\2_0446 Akira\akira.dgi",ct=48,cb=48,cl=0,cr=0, rw=1920, rh=1032)
propClearAll()
CompTest(1)
z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
ConvertBits(32)
BM3D_CUDA(sigma=3, radius=2)
BM3D_VAggregate(radius=2)
fmtc_bitdepth (bits=10,dmode=8)
neo_f3kdb(range=15, Y=65, Cb=40, Cr=40, grainY=0, grainC=0, sample_mode=2, blur_first=true, dynamic_grain=false, mt=false, keep_tv_range=true)
Prefetch(1) # 1,4,6

x265.exe --crf 22 --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_temp\akira_cube_out.hevc" "F:\In\2_0446 Akira\akira_cube_temp\akira_cube.avs"

AVSCube:

no Prefetch: encoded 1792 frames in 525.70s (3.41 fps), 1244.06 kb/s, Avg QP:26.48
Prefetch(4): encoded 1792 frames in 430.76s (4.16 fps), 1244.31 kb/s, Avg QP:26.49
Prefetch(6): encoded 1792 frames in 346.58s (5.17 fps), 1242.82 kb/s, Avg QP:26.49

DGCube:
no Prefetch: encoded 1792 frames in 525.13s (3.41 fps), 1242.09 kb/s, Avg QP:26.50
Prefetch(4): encoded 1792 frames in 415.67s (4.31 fps), 1244.13 kb/s, Avg QP:26.50
Prefetch(6): encoded 1792 frames in 351.01s (5.11 fps), 1243.90 kb/s, Avg QP:26.49

It's really strange that both me and you are having cpu and gpu results really too much aligned. I start to think that perhaps the limit is somewhere else.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Aligned results also with x265.exe --crf 20 --preset slow --output-depth 10 --aq-mode 5 --fades --colorprim bt2020 --colormatrix bt2020nc --transfer arib-std-b67 --range limited --min-luma 64 --max-luma 940 --output "F:\In\2_0446 Akira\akira_cube_6_temp\akira_cube_6_out.hevc" "F:\In\2_0446 Akira\akira_cube_6_temp\akira_cube_6.avs"

AVSCube: encoded 1792 frames in 673.69s (2.66 fps), 1716.09 kb/s, Avg QP:24.01
DGCube: encoded 1792 frames in 696.78s (2.57 fps), 1716.41 kb/s, Avg QP:24.02

According to Agatha Christie, one coincidence is just a coincidence, two coincidences are a clue, three coincidences are a proof.
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

It's just that the actual 3D LUT application is tiny compared to everything else, for both versions. I compared scripts with BlankClip() source and no conversions and things are as expected. DGCube is faster for no prefetch. Cube is faster with prefetch, however, each prefetch comes with more CPU utilization. So for transcoding, DGCube could be useful when the encoding load is high, compared to Cube with prefetch.

I'm going to add tetrahedral interpolation to both to address ErazorTT's problem case.
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Guest 2 wrote:
Sun Aug 07, 2022 10:05 am
Rocky wrote:
Sun Aug 07, 2022 8:53 am
Can you tell me about DTL's workarounds? Any links?
https://forum.doom9.org/showthread.php?t=183517

It's a long thread, where he seemed to go thru some of your issues.
I read the whole thread and didn't see anything relevant. Did I miss something?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Tue Aug 09, 2022 11:15 am
I compared scripts with BlankClip() source and no conversions and things are as expected.
Please, post your script.
Rocky wrote:
Tue Aug 09, 2022 6:14 pm
I read the whole thread and didn't see anything relevant. Did I miss something?
:oops:
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

You are forgiven. :P

Here is a new version supporting tetrahedral interpolation. It addresses ErazorTT's issue, as the artifacts do not occur with tetrahedral. Please read the new DGCube.txt file for details and be aware that the filter is now invoked as DGCube(). I'll add this to the timecube-derived Cube() filter as well. Also need to add Vapoursynth support to DGCube().

https://rationalqm.us/misc/DGCube.zip

Please let the Doom9 guys know about this.

The script you asked for:

loadplugin("D:\Don\Programming\C++\Avisynth filters\DGCube\x64\Release\dgcube.dll")
BlankClip(pixel_type="RGBP16", width=3840, height=2160, length=1000)
DGCube("IDENTITY.cube")
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Wed Aug 10, 2022 9:04 am
Please let the Doom9 guys know about this.
Your wish is my command. :salute:

P.S: Will you add tetrahedral to AVSCube too?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Wed Aug 10, 2022 9:04 am
The script you asked for
With a commercial (BBC) LUT:

AVSCube no prefetch
Number of frames: 1000
Length (hh:mm:ss.ms): 00:00:41.667
Frame width: 3840
Frame height: 2160
Framerate: 24.000 (24/1)
Colorspace: RGBP16
Audio channels: 1
Audio bits/sample: 16
Audio sample rate: 44100
Audio samples: 1837500

Frames processed: 1000 (0 - 999)
FPS (min | max | average): 8.370 | 14.84 | 14.11
Process memory usage (max): 116 MiB
Thread count: 9
CPU usage (average): 8.6%

GPU usage (average): 4%
VPU usage (average): 0%
GPU memory usage: 658 MiB
GPU Power Consumption (average): 11.7 W

AVSCube 6 threads
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 35.30 | 90.92 | 59.82
Process memory usage (max): 1259 MiB
Thread count: 18
CPU usage (average): 61.6%

GPU usage (average): 3%
VPU usage (average): 0%
GPU memory usage: 658 MiB
GPU Power Consumption (average): 11.8 W

DGCube no prefetch
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 21.50 | 45.00 | 41.58
Process memory usage (max): 221 MiB
Thread count: 13
CPU usage (average): 9.5%

GPU usage (average): 66%
VPU usage (average): 0%
GPU memory usage: 867 MiB
GPU Power Consumption (average): 49.1 W

DGCube 6 threads
Frames processed: 1000 (0 - 999)
FPS (min | max | average): 28.36 | 81.98 | 51.89
Process memory usage (max): 1755 MiB
Thread count: 24
CPU usage (average): 61.6%

GPU usage (average): 85%
VPU usage (average): 0%
GPU memory usage: 1910 MiB
GPU Power Consumption (average): 50.9 W

Interesting, it seems to hit a wall. :)

Rocky, do you think we could give fmtconv a try instead of z? I have tried do look at documentation and it's a bit obscure to me :)

P.S Some day I will ask you about nVidia DALI :mrgreen:
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

I was trying to get fmtc working today but failed. I'll try again after Vapoursynth support.
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Please re-download to get:

* Vapoursynth support.
* 'device' parameter to select GPU device.
* Updated user manual.

Salvadore?
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Actually, I'm not going to add tetrahedral to timecube, because of all the assembler intrinsics stuff, for which I am neither qualified nor motivated for. And it gives a raison d'etre for DGCube. :lol:
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Fri Aug 12, 2022 10:29 am
And it gives a raison d'etre for DGCube.
:mrgreen:

Do you think it's possible to have the necessary color space conversion ported to CUDA, to offload the cpu as much as possible?

OpenCV supports it easily and, if I am not wrong, it's written in CUDA... so...
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Guest 2 wrote:
Sat Aug 13, 2022 2:40 am
Do you think it's possible to have the necessary color space conversion ported to CUDA, to offload the cpu as much as possible?
Sure. Bullwinkle already mentioned that.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Sat Aug 13, 2022 8:11 am
Sure. Bullwinkle already mentioned that.
:salute:
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

It's kicking my patootie. Maybe I should call in Britney.
User avatar
Sherman
Posts: 576
Joined: Mon Jan 06, 2020 10:19 pm

Port Cube

Post by Sherman »

Did you want me to try?
User avatar
Rocky
Posts: 3555
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Aren't you busy with your new tube tester?
Post Reply