Performance question

Support forum for DGDecNV
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance question

Post by Guest 2 »

It was ages I didn't try to encode something with the support of a software decoder.

I had a bunch of anime to encode during xmas to HEVC and I tried to give the batch process of StaxRip a run.

Before configuring StaxRip to properly index using DGIndexNV, it had FFVideoSource as default.

Well, with my big surprise its performance are equal if not faster than DGDecNV, when encoding, despite with the latter the encoding is offloaded to GPU.

Perhaps my PCI-e 2.0 is a bit old, perhaps my 1060 3GB is not the fastest card around and the CPU is an ancient i7-2600k.

Some years ago there was the possibility to choose between CUDA and CUVID decoding but it has disappeared.

Beside that, do you have any hint to increase DGDecNV performance?
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Performance issue

Post by Rocky »

Can you please tell me:

1. the source details
2. both scripts
3. the target format details
4. the performance details for both cases

I'd like to try duplicating it before speculating.
DAE avatar
Guest

Performance issue

Post by Guest »

Guest 2
Well, with my big surprise its performance are equal if not faster than DGDecNV, when encoding, despite with the latter the encoding is offloaded to GPU.
I believe that DGDecodeNV frame serves, not encodes, your cpu is doing the encoding.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

gonca wrote:
Mon Dec 27, 2021 7:51 am
I believe that DGDecodeNV frame serves, not encodes, your cpu is doing the encoding.
Serving, serving. My mistake.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

Rocky wrote:
Mon Dec 27, 2021 6:06 am
1. the source details
2. both scripts
3. the target format details
4. the performance details for both cases
1) Plain 1080p anime in mkv container, x264 8000 kbit/s cbr
2) No script at all, i.e. the plain lines to serve video to x265, nothing else
3) HEVC 10 bit mkv, x265.exe --crf 22 --tune animation --output-depth 10 --colorprim bt709 --colormatrix bt709 --transfer bt709 --range limited
4) Let me finish the queue and I will give you some results.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Performance issue

Post by Rocky »

This may be an MKV issue. I know my old MKV library has issues. Any chance to try with an M2TS?

"No script at all, i.e. the plain lines to serve video to x265, nothing else"

You need a script to invoke DGSource, so you are confusing me. How do you send the script output to x265? Just whatever staxrip does? If you need a special version of x265, please tell me what it is and where to get it.

Not doing any prefetch games (possibly transparently by staxrip)?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

Rocky wrote:
Mon Dec 27, 2021 8:33 am
Any chance to try with an M2TS?
I will.
Rocky wrote:
Mon Dec 27, 2021 8:33 am
You need a script to invoke DGSource, so you are confusing me.
I meant: the scripts are really minimal, just the necessary few lines to invoke DG or FF.
Rocky wrote:
Mon Dec 27, 2021 8:33 am
Not doing any prefetch games (possibly transparently by staxrip)?
As far I can see, the script is plain simple, with no prefetching at all.

DG one:

Code: Select all

LoadPlugin("D:\Eseguibili\Media\DGDecNV\DGDecodeNV.dll")
DGSource("G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\temp.dgi")
FF one:

Code: Select all

LoadPlugin("D:\Eseguibili\Media\StaxRip Anime\Apps\Plugins\Dual\ffms2\ffms2.dll")
tcFile = "G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\96 Round finale_timestamps.txt" # timestamps file path
Exist(tcFile) ? FFVideoSource("G:\Raw\World Trigger\2021 3ª\96 Round finale.mkv", cachefile="G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\temp.ffindex", timecodes=tcFile) : FFVideoSource("G:\Raw\World Trigger\2021 3ª\96 Round finale.mkv", cachefile="G:\Raw\World Trigger\2021 3ª\96 Round finale_temp\temp.ffindex")
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

Some data for MKV, will update later with M2TS results.

Recompression from 1080p mkv x264 to x265 (same command line as before).

DG: encoded 33596 frames in 2256.04s (14.89 fps), 1126.28 kb/s, Avg QP:27.89
FF: encoded 33596 frames in 2320.88s (14.48 fps), 1126.28 kb/s, Avg QP:27.89
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Performance issue

Post by Rocky »

You didn't answer my questions about which version of x265 and how you feed the script to it.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

Rocky wrote:
Mon Dec 27, 2021 12:46 pm
You didn't answer my questions about which version of x265 and how you feed the script to it.
avs+ [INFO]: AviSynth+ 3.7.1 (r3577, master, x86_64)
x265 [INFO]: HEVC encoder version 3.5+21+12-cb341a7ef [Mod by Patman]

The AVS script is generated and launched by StaxRip.

I did another test with a m2ts

DG
encoded 34377 frames in 2405.01s (14.29 fps), 1025.35 kb/s, Avg QP:27.50

FF
encoded 34377 frames in 2363.18s (14.55 fps), 1025.35 kb/s, Avg QP:27.50
DAE avatar
Guest

Performance issue

Post by Guest »

Guest 2, what is the cpu usage during each scenario?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

gonca wrote:
Mon Dec 27, 2021 3:19 pm
Guest 2, what is the cpu usage during each scenario?
As I wrote in my first message, i7-2600k @ 4.5 GHz
DAE avatar
renols
Posts: 149
Joined: Tue Feb 22, 2011 2:34 am

Performance issue

Post by renols »

Hi.

I am not quite sure what you expect.

The encoding happens in the CPU and not in the GPU. It doesn't really matter what you use to serve the frames in my opinion. dgdecnv can probably serve much higher FPS to x265, but x265 is the bottleneck here.

Try to run the avs file in avsmeter, and you will probably see that it is showing much higher FPS.

x265 is just a bitch when it comes to encoding. Seeing 3-4 FPS is not exceptional when encoding movies with x265.

Maybe I am missing something with your workflow, but x265 is just much more CPU hungry than x264.

renols
DAE avatar
Guest

Performance issue

Post by Guest »

Guest 2 wrote:
Mon Dec 27, 2021 3:23 pm
gonca wrote:
Mon Dec 27, 2021 3:19 pm
Guest 2, what is the cpu usage during each scenario?
As I wrote in my first message, i7-2600k @ 4.5 GHz
Not cpu type, usage/percentage
Are you using 100% of the cpu during the encodes?
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

gonca wrote:
Mon Dec 27, 2021 3:49 pm
Are you using 100% of the cpu during the encodes?
When using --preset slow yes, with --preset medium almost always with rare decreases to 96/97%.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

renols wrote:
Mon Dec 27, 2021 3:32 pm
Maybe I am missing something with your workflow, but x265 is just much more CPU hungry than x264.
I know. Tomorrow in the morning I will try some tests with avsmeter and x264.
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Performance issue

Post by Rocky »

That's what I was thinking. The decoding is such a small part of things that all source filters will look alike.
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

Some results with AVSMeter64 and simple AVS scripts, just the few lines to serve the video:

DG:

Code: Select all

AVSMeter 3.0.8.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.1 (r3577, master, x86_64) (3.7.1.0)

Number of frames:                    34377
Length (hh:mm:ss.ms):         00:23:53.807
Frame width:                          1920
Frame height:                         1080
Framerate:                          23.976 (24000/1001)
Colorspace:                           YV12

Frames processed:                   34377 (0 - 34376)
FPS (min | max | average):          334.1 | 775.2 | 725.3
Process memory usage (max):         304 MiB
Thread count:                       14
CPU usage (average):                13.9%

GPU usage (average):                40%
VPU usage (average):                90%
GPU memory usage:                   584 MiB
GPU Power Consumption (average):    43.5 W

Time (elapsed):                     00:00:47.398
FF:

Code: Select all

AVSMeter 3.0.8.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.1 (r3577, master, x86_64) (3.7.1.0)

Number of frames:                    34377
Length (hh:mm:ss.ms):         00:23:53.807
Frame width:                          1920
Frame height:                         1080
Framerate:                          23.976 (24000/1001)
Colorspace:                           i420

Frames processed:                   34377 (0 - 34376)
FPS (min | max | average):          241.9 | 1115 | 482.1
Process memory usage (max):         110 MiB
Thread count:                       17
CPU usage (average):                79.8%

GPU usage (average):                5%
VPU usage (average):                0%
GPU memory usage:                   473 MiB
GPU Power Consumption (average):    10.7 W

Time (elapsed):                     00:01:11.306
I can't complain about DG performance at all :D

Now the results with x264.exe --crf 20 --preset slow --tune animation --level 4.1 --aq-mode 2 --colorprim bt709 --colormatrix bt709 --transfer bt709 --range tv on the very same scripts:

DG:

Code: Select all

encoded 34377 frames, 31.55 fps, 2561.83 kb/s, duration 0:18:09.76
FF:

Code: Select all

encoded 34377 frames, 31.39 fps, 2561.83 kb/s, duration 0:18:15.00
With --preset medium:

DG:

Code: Select all

encoded 34377 frames, 38.58 fps, 2786.29 kb/s, duration 0:14:51.06
FF:

Code: Select all

encoded 34377 frames, 36.40 fps, 2786.29 kb/s, duration 0:15:44.36
Perhaps the decoding task is really a small part of the whole process.

Unfortunately my rig doesn't have the power to deal with 4k. If someone else can do some tests, they could be useful.
DAE avatar
Guest

Performance issue

Post by Guest »

Using VapourSynth Editor to bench mark script
4000 frames, 4K

Code: Select all

import vapoursynth as vs
from vapoursynth import core

#####FRAME SERVER#####
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecodenv/DGDecodeNV.dll")
clip = core.dgdecodenv.DGSource(r'F:\4K\BLACK PANTHER PID 1011.dgi',  fieldop=0)

clip = core.resize.Point(clip, format=vs.YUV420P10)

clip.set_output()
190 fps
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance issue

Post by Guest 2 »

gonca wrote:
Tue Dec 28, 2021 7:07 am
Using VapourSynth Editor to bench mark script
And FFMS2 as reference?
DAE avatar
Guest

Performance question

Post by Guest »

Do not have FFMS2 installed, don't need it
The fps is not only dependent on resolution, but also cpu and gpu.
Performance comparisons would have account for this as well.

Edit
FFMS2 is not frame accurate
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Performance question

Post by Rocky »

gonca wrote:
Tue Dec 28, 2021 9:05 am
FFMS2 is not frame accurate
In some cases, yes. People resort to remuxing transport streams to MKV to get around it. That is absurd to me. Why not just fix it? Accurate random access is actually what we hang our hat on for DGDecNV. Sure, in some use cases faster decoding can be achieved and can contribute to faster transcoding, but not always, as we have seen in this thread.

One thing I want to do is look into improving MKV parsing. While it's not horribly bad, why not fix that too, if possible.
DAE avatar
Guest

Performance question

Post by Guest »

To appease the poisonous frog of torment
ffms2.log
(149.48 KiB) Downloaded 185 times
dg.log
(149.43 KiB) Downloaded 190 times
Bear in mind that DGDecodeNV was using approximately 40% of my VPU and 1.7% of CPU
FFMS2 was using no VPU and 34.7% of CPU
which is an absurd amount on my system and would significantly impact a hevc encode.

I'll stick to DGIndexNV
PS
FFMS2 had a (eternal) long delay launching on vdub2 and avsmeter while DGIndexNV was near instantaneous
If the purpose of the script is for encoding then DGDecodeNV is the superior choice, no cpu load
DAE avatar
Guest 2
Posts: 903
Joined: Mon Sep 20, 2010 2:18 pm

Performance question

Post by Guest 2 »

gonca wrote:
Tue Dec 28, 2021 11:46 am
I'll stick to DGIndexNV
And of course I will too... :belly-laugh:
User avatar
Rocky
Posts: 3556
Joined: Fri Sep 06, 2019 12:57 pm

Performance question

Post by Rocky »

Good to hear. ;)

Still, I'd like to do some testing before I mark this thread resolved.
Post Reply