CUDASynth

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

Re: CUDASynth

Post by DJATOM »

avscompat layer seems to be working, but speed is near the same as in "cpu" mode.
But still relatively fast - near 65 fps (default settings, DGSource -> DGDenoise -> DGSharpen) and 105 fps (default settings, DGSource -> DGDenoise).
Hardware: GTX 750, i5-4670k
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Thanks for the results, DJ. Just out of interest I'd like to see a benchmark of a script for these three (no CUDASynth):

Avisynth+
Vapoursynth native
Vapoursynth avscompat

I seem to recall when doing some testing recently both Vapoursynth ways fell short compared to Avisynth+, but I haven't tried it recently.
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

Re: CUDASynth

Post by DJATOM »

Ok, for now I've checked same script
ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi")
DGDenoise()
DGSharpen()
trim(0,6000)
and
ClearAutoloadDirs()
LoadPlugin("C:\322\x64\DGDecodeNV.dll")
DGSource("J:\Darling6\STREAM\EP16.dgi",fdst="gpu0")
DGDenoise(fsrc="gpu0",fdst="gpu0")
DGSharpen(fsrc="gpu0",fdst="cpu")
trim(0,6000)
So CUDASynth works in the native avs+.
C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 86.72 0:01:09 0:00:00
Started: Tue Oct 9 00:24:34 2018
Finished: Tue Oct 9 00:25:43 2018
Elapsed: 0:01:09

C:\322>avs2yuv64 EP16.avs -o NUL
Avs2YUV 0.28
Script file: EP16.avs
Resolution: 1920x1080
Frames per sec: 24000/1001 (23.976)
Total frames: 6001
CSP: YV12
Progress Frames FPS Elapsed Remain
[100.0%] 6000/6001 102.32 0:00:58 0:00:00
Started: Tue Oct 9 00:26:26 2018
Finished: Tue Oct 9 00:27:25 2018
Elapsed: 0:00:59
I'll measure avscompat (without and with fsrc/fdst) soon, need to close browser to have more GPU RAM for testing.
And as there are no native Vapoursynth versions for DGDenoise/DGSharpen, should I check them in avscompat and DGSource in the native modes?
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
DGSource("I:\test.dgi", fieldop=0, fulldepth=True)
ConvertBits(10)
FPS 92.3
import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 133.0
import vapoursynth as vs
core = vs.get_core()
core.avs.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.avs.DGSource("I:/test.dgi", fieldop=0, fulldepth=True)
clip = core.resize.Point(clip, format=vs.YUV420P10)
clip.set_output()
FPS 129.8
Source was a 4K clip

Edit
Avs compatability is 2x faster with cudasynth than with the regular version, 4K sample with DGHDRtoSDR (default) and DGSharpen (default)
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

Re: CUDASynth

Post by DJATOM »

cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

no cudasynth in avscompat:
import vapoursynth as vs
core = vs.get_core()

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi")
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

native DGSource + avscompat DGDenoise and DGSharpen:
import vapoursynth as vs
core = vs.get_core()

core.std.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')

clip = core.dgdecodenv.DGSource(r'J:\Darling6\STREAM\EP16.dgi')
clip = core.avs.DGDenoise(clip)
clip = core.avs.DGSharpen(clip)
clip = core.std.Trim(clip, 0, 6000)
clip.set_output()
Image

I don't know why we have such results, at least I tried to compare with minimum differences in the resource usage (with closed browser, etc).
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

DJATOM

Two items
Don't know if DGDenoise is actually cudasynth enabled yet
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
clip = core.avs.DGSharpen(clip, fsrc="gpu0", fdst="cpu")
should actually be
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu1")
clip = core.avs.DGSharpen(clip, fsrc="gpu1", fdst="cpu")
to get the ping pong effect
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

Re: CUDASynth

Post by DJATOM »

Oh, I thought gpu0/gpu1 is for 2 cards setup (I have only one).
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

I only have one card as well
I think it has to do with the pipelines/kernels???

Try it and see if it makes a difference
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

Re: CUDASynth

Post by DJATOM »

Tried and...
Image
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

Could you check on what your GPU usage is while running the script?
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Thanks, guys, awesome!

DGDenoise and DGSharpen are both CUDASynth-enabled.

Meanwhile, there is another limitation I discovered. Some 3rd party players and encode apps open the script multiple times. That will not work with CUDASynth as currently designed because there can be only one pipeline. I think I can fix that up fairly easily by having only the first source filter set up the framework.

Also, I have CUDASynth-enabled DGPQtoHLG. I'll make a release tomorrow after some testing.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: CUDASynth

Post by hydra3333 »

Extremely nice work, DG. Thank you.

edit:To allay my lack of clarity, in the context of the new pipeline enabled DGDecodeNV.dll and the aforementioned test scripts with like
(a)

Code: Select all

core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'I:\test.dgi', fieldop=0, fulldepth=True)
and
(b)

Code: Select all

core.avs.LoadPlugin(r'C:\322\x64\DGDecodeNV.dll')
clip = core.avs.DGSource(r'J:\Darling6\STREAM\EP16.dgi', fdst="gpu0")
clip = core.avs.DGDenoise(clip, fsrc="gpu0", fdst="gpu0")
edit: added LoadPlugin to snippet (b) for clarity

and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?

Hmm, I must not have caught up with the latest as my (long not updated) scripts have continued to use "core.avs.LoadPlugin" and "core.avs.DGSource" rather than "core.std.LoadPlugin" and "core.dgdecodenv.DGSource" ... damn, get I must get in from the scrub outa the midday sun. https://www.youtube.com/embed/z2YvYiWto ... &version=3
I really do like it here.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
Yes
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

and per the CUDASynth.txt "* Vapoursynth is not yet supported", would be correct to say the original non-cudasynth DGDecodeNV.dll is used in snippet (a) with ".dgdecodenv." and cudasynth DGDecodeNV.dll in snippet (b) with ".avs." ?
No. The dgdecodenv versus avs decides whether the referenced DLL is invoked natively or via the avscompat layer. Either way, you would still load the same DLL. However, the CUDASynth DLL can only be loaded with avscompat at this time. If you omit the load plugin call then you could pick up something from autoloading. I recommend always using explicit loading.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

Snippet (a) is the one I used in the benchmarking you asked for, and it uses to original non-cudasynth dll
Snippet (b) is the one from DJATOM's testing of the cudasynth dll
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

I don't see a loadplugin call in snippet b so it's ambiguous. Also, snippet b leaves the output on the GPU. The last filter should output it to the CPU.

To get precise answers, one needs to ask precise questions. ;)
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

CUDASynth 0.2:

* Added CUDASynth-enabled DGPQtoHLG.

* Revised the user manual: explain meaning of gpu0/1 (not different cards!),
added note that Vapoursynth can be used in avscompat mode, and mention
limitation of some players and third-party apps.

http://rationalqm.us/misc/CUDASynth_0.2.rar
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Regarding the test results you guys gave, I'm having a little trouble digesting it as you have included CUDASynth results when I specifically asked you to exclude it. Also, there seems to be some confusion about native versus avscompat, etc. Finally, we want the prefetch() call for Avisynth+, otherwise we throw away some performance. Tell you what, I'll do some testing and post full results with full scripts and we can go from there.

One of my motivations here is to know whether using the asvcompat layer for Vapoursynth loses performance versus native. To be honest, I'd like to know why I should bother with the PITA of duplicating code to have native Vapoursynth if avscompat performs the same. Even if I have to release an avscompat.dll that's way easier than writing Vapoursynth native code for everything. Any thoughts?
User avatar
DJATOM
Posts: 176
Joined: Fri Oct 16, 2015 6:14 pm

Re: CUDASynth

Post by DJATOM »

Yeah, it's possible to make autoloading with hand-written python module (as I did before native DGSource version came out), but it's, say, wasting a time to type another line in the script. So I'd like to have native versions if possible. I almost don't use avs+ nowadays, moved to VS about 1 year ago :lol:
PC: RTX 2070 | Ryzen R9 5950X (no OC) | 64 GB RAM
Notebook: RTX 4060 | Ryzen R9 7945HX | 32 GB RAM
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: CUDASynth

Post by hydra3333 »

admin wrote:
Tue Oct 09, 2018 4:38 pm
Tell you what, I'll do some testing and post full results with full scripts and we can go from there.
Beaut ! :hat:
admin wrote:
Tue Oct 09, 2018 4:38 pm
One of my motivations here is to know whether using the asvcompat layer for Vapoursynth loses performance versus native. To be honest, I'd like to know why I should bother with the PITA of duplicating code to have native Vapoursynth if avscompat performs the same. Even if I have to release an avscompat.dll that's way easier than writing Vapoursynth native code for everything. Any thoughts?
An eminently reasonable line of reasoning :) Maybe also a question over at the other site as to what may or may not be be foregone if using the asvcompat layer for Vapoursynth versus native ? With any luck the VS author may provide some insight.
I really do like it here.
User avatar
hydra3333
Posts: 394
Joined: Wed Oct 06, 2010 3:34 am
Contact:

Re: CUDASynth

Post by hydra3333 »

DJATOM wrote:
Tue Oct 09, 2018 5:56 pm
Yeah, it's possible to make autoloading with hand-written python module (as I did before native DGSource version came out), but it's, say, wasting a time to type another line in the script. So I'd like to have native versions if possible. I almost don't use avs+ nowadays, moved to VS about 1 year ago :lol:
Being a control freak from way back (too many systems went belly up if decent control was omitted during development) I always manually load everything and don't begrudge a line or 20 of code :) Personal preference.

rar ? google tells me
What does RAR stand for in finance?
Abbr. Meaning
RAR Revenue Agent Report (US IRS)
RAR Refund-Anticipated Return
RAR Run At Risk
RAR Regulatory Asset Ratio (finance)
3rd seems about right given the vsrepo experience with 7z :D
I really do like it here.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

If you go this route all I really need to do is change my templates to be avs compatible.
So, all is good
Now that I think about it, the only reason I moved to vs was the lack of high bit depth support in the avisynth chain (NVEncC)
That has been corrected though
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

Good points, guys, thanks.

I have found a simple script that runs perfect in Avisynth+ but runs at half speed and then stops completely in Vapoursynth native (no CUDASynth for both). I want to check a few things first and then I'll give you the script to see if you can replicate it. Then we'll have to try to figure out what is going wrong.
User avatar
admin
Posts: 4551
Joined: Thu Sep 09, 2010 3:08 pm

Re: CUDASynth

Post by admin »

I found the cause of the Vapoursynth slowdown and stoppage. DGHDRtoSDR was missing a freeFrame(src) call (affecting only the Vapoursynth code) and so memory was being exhausted. I'll release a fix later today and then get back to proper benchmarking.
DAE avatar
Guest

Re: CUDASynth

Post by Guest »

Good to hear you are making headway
Post Reply