CUDASynth

These CUDA filters are packaged into DGDecodeNV, which is part of DGDecNV.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Here's something interesting. The spatial denoising kernel has six variants: three radiuses each for 8-bit and 16-bit. They use different constants for various things. These constants have to be known at compile time to allow loop unrolling, declaration of unsigned char versus unsigned int, etc. Maintaining that is a PITA as any small change has to be repeated 6 times. So yesterday I thought OK, let's make the kernel a macro (kernel templates are impossible or extremely kludgy for the CUDA driver API). Should be a walk in the park, right? Silly billy, the C++ preprocessor just isn't designed for this, and 6 hours later I surrendered. For example, the preprocessor cannot even emit a linefeed after each line of the multi-line macro. The output is all on one long line. Oh sure, no problem for the compiler but try code reading and debugging. And if parts of the macro have to be determined by calculations, watch out. :evil:

So this morning I thought, how hard can it be to write my own specialized preprocessor that understands everything needed to correctly emit multiple tailored kernels. And I had it working in less than two hours. Rah rah, sis boom bah! If I want to, I can straightforwardly make and maintain an arbitrary number of variants, for example, to support all the radiuses one could ever want.

See below how simple it is. The prefix file is the stuff at the top of the output .cu file that is not repeated. The input file specifies the kernel template that will be repeated with substitutions. For example, |NAME| gets replaced with NLM, NLM2, ... Some of the variant replacements are the same for all variants. I do it that way to allow for the variants to diverge in the future. Don't get picky on my code; I just dashed it off quick and dirty.

Code: Select all

#include <stdio.h>
#include <string>

void replaceAll(std::string& str, const std::string& from, const std::string& to)
{
	if (from.empty())
		return;
	size_t start_pos = 0;
	while ((start_pos = str.find(from, start_pos)) != std::string::npos)
	{
		str.replace(start_pos, from.length(), to);
		start_pos += to.length(); // In case 'to' contains 'from', like replacing 'x' with 'yx'
	}
}

int main(int argc, char *argv[])
{
	int i;
	FILE* fp, * wfp;
	char line[1024];
#define NUM_VARIANTS 6
	const char *name[NUM_VARIANTS ] =
	{
		"NLM", "NLM2", "NLM3", "NLM_hdr", "NLM2_hdr", "NLM3_hdr"
	};
	const char* window_radius[NUM_VARIANTS ] =
	{
		"4", "6", "8", "4", "6", "8"
	};
	const char* block_radius[NUM_VARIANTS ] =
	{
		"2", "2", "2", "2", "2", "2"
	};
	const char* weight_threshold[NUM_VARIANTS ] =
	{
		"0.10f", "0.10f", "0.10f", "0.10f", "0.10f", "0.10f"
	};
	const char* threshold[NUM_VARIANTS ] =
	{
		"0.10f", "0.10f", "0.10f", "0.10f", "0.10f", "0.10f"
	};
	char inv_window_area[NUM_VARIANTS ][128];
	const char* type[NUM_VARIANTS ] =
	{
		"unsigned char", "unsigned char", "unsigned char", "unsigned int", "unsigned int", "unsigned int"
	};
	const char* factor1[NUM_VARIANTS ] =
	{
		"256.0f", "256.0f", "256.0f", "65536.0f", "65536.0f", "65536.0f"
	};
	const char* factor2[NUM_VARIANTS ] =
	{
		"255.0f", "255.0f", "255.0f", "65535.0f", "65535.0f", "65535.0f"
	};

	for (i = 0; i < NUM_VARIANTS ; i++)
	{
		double tmp;

		tmp = (1.0f / ((2 * atof(window_radius[i]) + 1) * (2 * atof(window_radius[i]) + 1)));
		sprintf_s(inv_window_area[i], 128, "%.8f", tmp);
	}

	fopen_s(&wfp, argv[3], "w");
	if (wfp == NULL)
	{
		printf("Couldn't open output file %s.\n", argv[2]);
		return 1;
	}
	fopen_s(&fp, argv[1], "r");
	if (fp == NULL)
	{
		printf("Couldn't open prefix file %s.\n", argv[1]);
		return 1;
	}
	while (fgets(line, 1024, fp) != NULL)
	{
		fputs(line, wfp);
	}
	fputs("\n", wfp);
	fclose(fp);
	for (i = 0; i < NUM_VARIANTS ; i++)
	{
		fopen_s(&fp, argv[2], "r");
		if (fp == NULL)
		{
			printf("Couldn't open input file %s.\n", argv[1]);
			return 1;
		}
		while (fgets(line, 1024, fp) != NULL)
		{
			std::string tmp = line;
			replaceAll(tmp, "|NAME|", name[i]);
			replaceAll(tmp, "|WINDOW_RADIUS|", window_radius[i]);
			replaceAll(tmp, "|BLOCK_RADIUS|", block_radius[i]);
			replaceAll(tmp, "|WEIGHT_THRESHOLD|", weight_threshold[i]);
			replaceAll(tmp, "|THRESHOLD|", threshold[i]);
			replaceAll(tmp, "|INV_WINDOW_AREA|", inv_window_area[i]);
			replaceAll(tmp, "|TYPE|", type[i]);
			replaceAll(tmp, "|FACTOR1|", factor1[i]);
			replaceAll(tmp, "|FACTOR2|", factor2[i]);
			strcpy_s(line, 1024, tmp.c_str());
			fputs(line, wfp);
		}
		fputs("\n", wfp);
		fclose(fp);
	}
	fclose(wfp);

	return 0;
}
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Here is test5 adding support for DGDenoise() standalone.

https://rationalqm.us/misc/DGDecodeNV_test5.zip
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Wed Feb 21, 2024 9:53 am
So this morning I thought, how hard can it be to write my own specialized preprocessor that understands everything needed to correctly emit multiple tailored kernels.
:ugeek: :salute: Guessing there may be some expertise associated with doing it ;) ... mortals may have had a challenge. I wonder if Rocky's pic could be updated to hold a Zeus-like thunderbolt, or an Odin-like horny hat.
Thanks !
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you, you are very kind. I'll keep pretending to be modest, though.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

I stopped CUDASynth development temporarily to convert to Vapoursynth API4. I have that working now but still have to code review and regression test. Also, some DGDemux work has arisen.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

DGDemux work is done. Vapoursynth API4 is done. DGSharpen() is integrated into DGSource(). CUDA error checking revamped so every CUDA call is checked. Just gonna do some regression testing and update the notes doc, then I'll give y'all a test6 build.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Mon Feb 26, 2024 12:24 pm
Just gonna do some regression testing and update the notes doc ...
Ah, QA related actions, dear to my heart and it used to pay well enough ... you must hang around with old people too much, thinking such things need doing :salute:
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

He he. My first job as a "real" engineer (degree completed) was release testing for a statistical multiplexer. Before that I was doing application engineering for voice telephony products. And before that I wrote technical manuals for a telephone switching system. What did you work on?

BTW, my testing revealed three bugs in test6 that I need to fix, so Bob's yer uncle. No worries, I already fixed two, one to go.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Nice ! Also nice handle on the local colloquialisms :) Bob is my uncle so that works too. ;)

At uni (punch cards era) started working for the Catholic Church on a new education payroll system (didn't earn a halo), then worked for a finance company ... it was funny, since I did the "technical" IT stream at uni not the financial one, but $ are $.
Then offered a 17% salary bump by govt Transport in the 70s so I moved and early on played with distributed control systems and whatnot (fun, assembler et al and debugging using octal bit lights and single-stepping with flip switches on the front of PDPs). "Senior Systems Programmer" was the front title, later Vaxs (yay) and dabbled with Wangs (ugh). Did not at all play Moon Lander with a "brand new high tech" graphics green-screen and light-pen which was nearly unheard of here back then ... loaded via paper tape ... cough, nor became good enough to thrust up then land over the lunar mountains on flatter terrain :D Around then, green-screen "VDU"s came along here and were terribly headache-inducing, awful :( About then broke my neck semi-crushing a "C" but got away with it in the end ... learned to drink beer at the pub through a straw (not recommended) over some months, metal neck bracing and whatnot.
Stayed there a lifetime, you name it I dabbled in it, vanilla programming usually tech oriented, networking, project mgt, softer stuff, etc. Finished as I started, managing an internationally sourced control system with a lot of distributed independently operating semi-connected bits of equipment and systems, a decent slice customer facing, largeish data with all of the usual related matters and politics and "look at me, I've a great new idea even though I know nothing at all about it nor the business" knobs that go with such. "Don't make the papers" (i.e. do not bugger up and become the butt of inevitable newspaper reporting) was a local oft-repeated motto, QA was a real thing.
Glad to retire !
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

I squished the three bugs and found another one. But I squished that too. Should have test6 after lunch. Boiled acorns and breaded fried grubs with hot sauce. Yummy! Britney made it for us. DG is trying it for the first time.

Oh, one question. semi-crushing a "C". Is that a surfing thing, or...
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Here is test6:

* Vapoursynth API4
* Integrated DGSharpen
* Updated Notes.txt

https://rationalqm.us/misc/DGDecodeNV_test6.zip
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Tue Feb 27, 2024 11:14 am
semi-crushing a "C"
Crushed one of the C1 to C7 neck vertebrae such that the jellified anterior bottom of it was pushed forward in the neck, clearly visible in the xrays; time has fuzzed over which one it was ...
Please don't ask how, certain things we as (hopefully) rational people do not under any circumstances believe in managed to happen around that time and we both witnessed them :( We don't go there :)
I really do like it here.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Tue Feb 27, 2024 12:04 pm
Here is test6:
Thanks.

Happy to report simple tests with

OTA-mpeg2/deinterlacing/denoising(best,temporal)/sharpening
poor VHS-mpeg2/deinterlacing/denoising(best,temporal)/sharpening
h.265/h2s/denoising(good,spatial)/sharpening
caused no crashes appeared to work as expected.

Although ... the attached pop-up showed itself once during multiple test runs on the same file sets (in vhs?) ... can't explain why only once :) perhaps it's the only time I noticed it although that seems unlikely.

Cheers
Attachments
pop-up.jpg
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Ha ha, that's freaky. It should display only when the result is not CUDA_SUCCESS. :? And exiting at that point would be bad for your encode.

I'll check it. Lemme know if it keeps happening.

See anything wrong here?

Code: Select all

#define ck(e) \
do \
{ \
	if (e != CUDA_SUCCESS) \
	{ \
		char cumessage[2048]; \
		const char* szErrName = NULL; \
		cuGetErrorName(e, &szErrName); \
		sprintf(cumessage, "%s at file %s line %d. Exiting...", szErrName, __FILE__, __LINE__); \
		MessageBox(NULL, cumessage, "DGSource", MB_OK | MB_ICONERROR | MB_TOPMOST | MB_SETFOREGROUND); \
		exit(0); \
	} \
} while (0)

Code: Select all

		// Create the decoder
		ck(cuvidCreateDecoder(&state->cuDecoder, &state->dci));
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

I converted the macro to a static _inline. I'll keep an eye on it.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Please re-download test6:

* Change cuda check macro to a static inline.
* Fix small bug in Sharpen.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Hmm, perhaps a cosmic ray reached the earth's surface and flipped a bit at just the right place ? Have seen lower probability things :D

Thank you for the updated test6.

This: h.265/h2s/denoising(good,spatial)/sharpening
worked OK

This: OTA-mpeg2/deinterlacing/denoising(good,temporal)/sharpening
froze, at the place shown in the log extract below. Resource Monitor showed no cpu/disk/network activity
edit: ah, note the popup message mentioned below.

Code: Select all

G:\HDTV\DGtest>TYPE "!_VPY_file!"  2>&1 
import vapoursynth as vs		# this allows use of constants eg vs.YUV420P8 
from vapoursynth import core	# actual vapoursynth core 
#import functool 
#import mvsfunc as mvs			# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
#import havsfunc as haf		# this relies on the .py residing at the VS folder root level - see run_vsrepo.bat 
core.std.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
core.avs.LoadPlugin(r'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex\DGDecodeNV.dll') # do it like gonca https://forum.doom9.org/showthread.php?p=1877765#post1877765 
# NOTE: deinterlace=1, use_top_field=True for "Interlaced"/"TFF" 
# dn_enable=x DENOISE 
# default 0  0: disabled  1: spatial denoising only  2: temporal denoising only  3: spatial and temporal denoising 
# dn_quality="x" default "good"    "good" "better" "best" ... "best" halves the speed compared pre-CUDASynth 
# dn_tthresh float default 75.0 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="best", dn_strength=0.06, dn_cstrength=0.06 ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="better", dn_strength=0.06, dn_cstrength=0.06 ) 
video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=3, dn_quality="good", dn_strength=0.06, dn_cstrength=0.06, dn_tthresh=75.0, dn_show=0, sh_enable=1, sh_strength=0.3 ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=3, dn_quality="good", dn_strength=0.06, dn_cstrength=0.06, dn_tthresh=75.0, dn_show=1 ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False ) 
#video = core.dgdecodenv.DGSource( r'G:\HDTV\DGtest\MPEG2_INTERLACED.DGI', deinterlace=1, use_top_field=True, use_pf=False, dn_enable=1, dn_quality="good", dn_strength=0.06, dn_cstrength=0.06 ) 
#video = core.avs.DGSharpen( video, strength=0.3 ) 
#video = vs.core.text.ClipInfo(video) 
video.set_output() 
G:\HDTV\DGtest>"!vspipeexe64!" --version  2>&1 
VapourSynth Video Processing Library
Copyright (c) 2012-2023 Fredrik Mellbin
Core R65
API R4.0
API R3.6
Options: -
G:\HDTV\DGtest>"!vspipeexe64!" --info "!_VPY_file!"  2>&1 
Width: 720
Height: 576
Frames: 56659
FPS: 25/1 (25.000 fps)
Format Name: YUV420P8
Color Family: YUV
Alpha: No
Sample Type: Integer
Bits: 8
SubSampling W: 1
SubSampling H: 1
Killed the process and re-ran the test without change, same "freeze" result :(
Killed the process and re-ran the test manually without change, in a cmd box except *not* redirecting output to a log file, and it did something ... this popped up:
pop-up2.jpg
and no frames were encoded.

I did install the latest Win11 updates this morning, if that helps.
Same nvidia driver as previously reported.
PC: 3099X 32Gb
VGA: nvidia 2060 Super 8Gb, driver 551.52

Running it the old way (same cudasynth, separate filter calls), but one run only, worked:

Code: Select all

G:\HDTV\DGtest>"!old_vspipeexe64!" --filter-time --container y4m "!_OLD_VPY_file!" --  2>&1 
Output 56659 frames in 126.09 seconds (449.35 fps)
Filtername           Filter mode   Time (%)   Time (s)
DGDenoise            parreq          99.47     125.42
DGSource             unordered       57.40      72.38
DGSharpen            parreq          22.88      28.85

Also, this (8 test encodes in a sequence, which had worked previously): poor VHS-mpeg2/deinterlacing/denoising(best,temporal)/sharpening
run 1) froze at the 3rd encode.
run 2) froze at the 7th encode and the popup below appeared.
pop-up3.jpg
run 3) all 8 encodes worked.
run 4) froze at 2nd encode and the same popup appeared.

Please don't tell me it's my system ;) Oh no, it's worked fine until now !

Also, may one enquire any noticeable effect from the small bug in Sharpen so I can look for it if practicable ?

While ck looks like a do once loop at first glance (never seen it used like that before, unsure why), no doubt it's supposed to be and/or I'm old and too out of date to know otherwise :oops: :)

Cheers

edit: could not guarantee this was not happening in previous versions of test.
I suppose I'll need to have a few tries with separated filter calls, then again with a released version of dg


edit: it's nothing to do with nvdec and the driver is it ? https://rationalqm.us/board/viewtopic.p ... 349#p20349 It seems a tad random.
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you. I may know what is going on. Standby for a new version.

The do-while thing is a standard little hack when using a multiline macro (so that you can put a semicolon after a macro invocation). Since I changed it to a normal function it's not applicable any more.

"Running it the old way..."

What is the old way and what is the new way?

Gosh, I hope it isn't the conversion to Vapoursynth API4. :?
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Fri Mar 01, 2024 3:59 am
"Running it the old way..."
What is the old way and what is the new way?
Apologies,
old = separate filter calls on separate lines, using the cudasynth dll
new = filters all on one line, the new way, real cudasynth
Rocky wrote:
Fri Mar 01, 2024 3:59 am
"Running it the old way..."
Gosh, I hope it isn't the conversion to Vapoursynth API4. :?
Oh dear. If it is, then wishing good luck to both of us !
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you. Please re-download test 6 and give it a whirl. I had a resource free'ing issue under Vapoursynth that could explain this. Whatever it is, don't worry, we'll get to the bottom of it.

P.S. I stopped drinking coffee. This is my third day. You can guess how I feel :wow:
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Rocky wrote:
Fri Mar 01, 2024 5:53 am
Please re-download test 6 and give it a whirl.
Cool. I just tried the VHS tests where a .bat runs through 8 files n sequence, generating a new .vpy each time and then using vspipe/ffmpeg to attempt an encode.

Nearly. This time
run #1 it did not like 3 of 8
pop-up4.jpg
run #2 all 8 successfully encoded
run #3 all 8 successfully encoded
run #4 it did not like 7 of 8
run #5 it did not like 1,7,8 of 8 (chrome/youtube confirmed playing in the background)
run #6 it did not like 3 of 8 (same chrome/youtube confirmed playing continuously in the background for the whole time)

Occasionally, I may have a chrome/youtube playing or paused in the background as mentioned above.
Although at a guess that may not help, the issue still appears to be a tad random.
There was also a Windows11 Sandbox also working concurrently, flogging its little heart out building ffmpeg.
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Merde. Can you please give me the bat file and everything I need to duplicate exactly what you are doing? Also, can you please repeat the same test with test5, which I have re-uploaded? There is no sharpen in test 5 so turn that off if you have it on. Thanks m8, I really appreciate your help.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

Sure. Just re-downloaded test5 and popped it into the right place and am re-running the tests now.

Will zip up the other things and pop it into a google drive shared folder for you.
I will comment out extraneous stuff such as mediainfo and whatnot.

edit: hmm, test5 yielded a pop-up on run #2 6 of 8. I tend to find I may need to run the test a number of times since it seems to run successfully a few times in a row
I really do like it here.
User avatar
hydra3333
Posts: 406
Joined: Wed Oct 06, 2010 3:34 am
Contact:

CUDASynth

Post by hydra3333 »

OK. In this share is
- a .zip of portable vapoursynth with portable python and ffmpeg and DG stuff in 'G:\HDTV\DGtest\Vapoursynth-x64\DGIndex'
- a folder of VHS test .mpg files
https://drive.google.com/drive/folders/ ... sp=sharing

Create a folder `G:\HDTV\DGtest` and
- extract the zip into it
- copy the folder TEST_SOURCE_VIDEOS into there (the folder itself)
- extract whichever DG dlls you wish to test into G:\HDTV\DGtest\Vapoursynth-x64\DGIndex

Of course one can edit the .bat files to change the disk/folder names.

One can use `wrapper_TEST_CUDASynth_VHS_clips.bat` to run the VHS tests in sequence and create a .log file.
When it's finished (or paused by a pop-up) edit the log file and search for all occurrences of 'frame=' and look for 0 fps denoting an encode fail.

I really hope it's not something I'm doing wrong !! :(
I really do like it here.
User avatar
Rocky
Posts: 3623
Joined: Fri Sep 06, 2019 12:57 pm

CUDASynth

Post by Rocky »

Thank you. Any results with test5?

Another thing to do is, when you get the error popup, immediately check the GPU memory usage (versus the card's equipped amount) with GPU-Z. The error corresponds to a failure of cuvidCreateDecoder(), so possibly the memory needed is not available at the time.

Running your tests without background GPU usage by other applications is preferable.
Post Reply