Port Cube

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Annnnd sending 2 / 4 frames to video memory, stitched as a single transfer, to make them process and moved back as single transaction again?

I was also thinking that it's possible to need to have to apply more than one cube at a time (ex: technical, artistic, technical again) and it would be a pity to transfer and convert data more than once for the same frame(s).
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

That's the whole point of our CUDASynth proposal.

Stitching requires lots more GPU memory for minimal gain. It's not the overhead of memory transfer management that is the bottleneck. It's the data raw transfer time for large frames.

But I can certainly put CUDASynth into my filters, so that for example the following script has to transfer only one decoded frame on the PCIe, i.e., the final frame going back to the CPU.

DGSource(appropriate CUDASynth params) # only compressed data goes up to GPU, decoded frame is left on GPU
Cube(appropriate CUDASynth params) # uses frame on GPU, writes to GPU memory
Cube(appropriate CUDASynth params) # uses frame on GPU, writes to GPU memory
Cube(appropriate CUDASynth params) # uses frame on GPU, writes to CPU and returns it to Avisynth

Any CUDASynth enabled filter could be used in the chain. Any version of Avisynth or Vapoursynth could be used. Really guys, what's not to like? :?

Remember, I tried to interest developers in this years ago but no-one was interested. I even gave them a sample filter showing how to do it. Zero interest. Fine, I'll just do it for all my filters, and when people see how fast things can be maybe they will give it a thought.

OK, back to coding the DGCube() changes we've agreed on.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Avisynth+ CUDA stuff moved to a new thread. Please follow up there on that subject.

Also, I updated my last post above.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Got the new scheme coded and tested for RGBP16. Moving on to YUV420P16. Probably won't be done until tomorrow.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Sun Aug 28, 2022 12:36 pm
Got the new scheme coded and tested for RGBP16. Moving on to YUV420P16. Probably won't be done until tomorrow.
:salute:
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

I'm thinking of not implementing full range input/output for YUV. You could still have a full or limited range LUT. For RGB, full range or limited would be allowed for input, lut, and output.

Will anyone pop an aneurysm if I do that? YUV from our disks, etc., is limited range.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Mon Aug 29, 2022 12:29 pm
Will anyone pop an aneurysm if I do that? YUV from our disks, etc., is limited range.
YUV from "usual disks" is limited TV range but YUV from some cameras (even pro ones like Arri) can be either limited or full.

While you have hands in it, do it the right way, or, at least, plan for future release.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

do it the right way
t, this stuff is hard. Lots of equations rigorously derived from specs. That's why there is no existing solution.

I'll plan for it and give an interim release without it. We need some samples for testing. Got any?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Mon Aug 29, 2022 1:01 pm
Got any?
Sure, let me see if I have some Sony Slog3 Full PC Range or similar. Perhaps tomorrow morning (CET).
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Guest 2 wrote:
Mon Aug 29, 2022 1:09 pm
Sony Slog3
OK.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Mon Aug 29, 2022 1:44 pm
OK.
I have found a Canon Log3 file, YUV 4:2:2 full range:

https://krakenfiles.com/view/3hCxREqNdC/file.html

Sorry for the elephant size but I don't really know how to split it without doing damages.

You need to index with LWLibavVideoSource, as nVidia doesn't support 4:2:2 (yet).

Here is the cube, pay attention that it converts Log3 full to 709 full:

https://krakenfiles.com/view/xSAX60AqLZ/file.html

You can find more LUTs here https://tools.rodrigopolo.com/canonluts/

And a tool to play with them here https://cameramanben.github.io/LUTCalc/ ... index.html
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

That's the best you've got? OK, I'll try to work with it. :D

Meanwhile, here is something to play with. Re-download DGCube.zip. It implements the new interface and should be good for everything except for full range YUV, which I am still working on. Refer to the document for details. Changes were extensive so your thorough testing will be greatly appreciated and amply rewarded.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Script error: the named argument "in" to DGCube had the wrong type.

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in=1, lut=0, out=1, interp="tetrahedral")

Why oh why.

P.S: It seems that we need to use "full" or "lim" (much better than 0 and 1). Please update txt. :D
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Number of frames: 1792
Length (hh:mm:ss.ms): 00:01:14.741
Frame width: 3840
Frame height: 2064
Framerate: 23.976 (24000/1001)
Colorspace: YUV420P10

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")


Frames processed: 1792 (0 - 1791)
FPS (min | max | average): 1.389 | 6.633 | 5.533
Process memory usage (max): 614 MiB
Thread count: 14
CPU usage (average): 8.4%

GPU usage (average): 16%
VPU usage (average): 10%
GPU memory usage: 1886 MiB
GPU Power Consumption (average): 42.5 W

Time (elapsed): 00:05:23.864

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="lim", lut="full", out="lim")

Frames processed: 1792 (0 - 1791)
FPS (min | max | average): 2.799 | 57.32 | 27.42
Process memory usage (max): 543 MiB
Thread count: 12
CPU usage (average): 8.4%

GPU usage (average): 56%
VPU usage (average): 42%
GPU memory usage: 1826 MiB
GPU Power Consumption (average): 54.5 W

Time (elapsed): 00:01:05.359
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Thank you for testing. Looks good so far.

Congrats on guessing new syntax. Never underestimate the t!

Text document was updated.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Wed Aug 31, 2022 11:09 am
Thank you for testing.
I see tiny discrepancies between the product of external (identical to AVSCube) and of internal processing (look at graphs, mostly).

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="lim", lut="full", out="lim")

Image

In limited LUT, the result on internal processing is totally wrong:

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
Cube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", fullrange=true)
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", in="lim", lut="lim", out="lim")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

DGCube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", in="lim", lut="lim", out="lim")

Image
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Wed Aug 31, 2022 11:09 am
Text document was updated.
Please, pay attention to Avisynth and Vapoursynth Examples sections, as you are using the old parameters.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Text document was updated.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Guest 2 wrote:
Wed Aug 31, 2022 12:18 pm
I see tiny discrepancies between the product of external (identical to AVSCube) and of internal processing (look at graphs, mostly).
Let's look at that after we fix your big issue.
In limited LUT, the result on internal processing is totally wrong
Confirmed. I have it fixed for CPU code. Just have to port it to GPU and do some regression testing, and then I can give you a new test version.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Please re-download and test. Should be working for lim-lim-lim YUV. If you confirm it is working we can look at the tiny discrepancies.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Fri Sep 02, 2022 7:29 am
Should be working for lim-lim-lim YUV.
Yes, it works now.
Rocky wrote:
Fri Sep 02, 2022 7:29 am
If you confirm it is working we can look at the tiny discrepancies.
1) BBC PQ to HLG, external conversion

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:full", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="full", lut="full", out="full")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:full=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

2) BBC PQ to HLG, internal conversion

DGCube("D:\Programmi\Media\AviSynth+\cube\1a_PQ1000_HLG_mode-nar_in-nar_out-nar_nocomp.cube", in="lim", lut="full", out="lim")

Image

3) Warner PQ to HLG, external conversion

z_ConvertFormat(pixel_type="RGBP16", colorspace_op="2020:st2084:2020:limited=>rgb:st2084:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")
DGCube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", in="lim", lut="lim", out="lim")
z_ConvertFormat(pixel_type="YUV420P10", colorspace_op="rgb:std-b67:2020:limited=>2020:std-b67:2020:limited", resample_filter_uv="spline64", dither_type="error_diffusion")

Image

4) Warner PQ to HLG, internal conversion

DGCube("D:\Programmi\Media\AviSynth+\cube\WarnerBros_PQToHLG_MaxCLL_1000.cube", in="lim", lut="lim", out="lim")

Image

1) and 3) are identical to my eyes and according to graphs
2) and 4) are identical to my eyes and according to graphs
1) and 2), such as 3) and 4) have some discrepancies, both in luma and color (look at the area around the smiley)

4k frames (same order):

Image

Image

Image

Image
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Good, thank you.

When you post new stream results, I' need to get the source stream in order to directly compare to your results. So can I get that stream please?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Port Cube

Post by Guest 2 »

Rocky wrote:
Fri Sep 02, 2022 12:58 pm
So can I get that stream please?
Mmm... hard to decide what piece to send. There are some parts that show the difference better and they are all really far from each other.

The m2ts is 96.062.312.448 bytes. :)

Here is a piece with broad enough luma and chroma ranges.

https://krakenfiles.com/view/mABFkMxsgb/file.html
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Port Cube

Post by Rocky »

Thank you.

Regarding 1-2 comparison. I guess it's because you use STD B67 for output. I don't do any conversions like that. You did that because you think it's appropriate for HLG? DGCube doesn't know what kind of LUT you are using. As I mentioned before, unless I implement the whole of z to accommodate any conversion you want, there will be discrepancies. I'm not excited about doing that so unless there is a middle ground, maybe best to just drop internal conversion.

There's also your dithering, which I do not do.

Anyway, let me investigate some more.
User avatar
Britney
Posts: 145
Joined: Sun Aug 09, 2020 3:24 pm

Port Cube

Post by Britney »

Some tiny discrepancies here too:

Post Reply