Rather than keeping a copy of existing nzb files for checking for duplicates is it possible to use another method?
1 - a unique identifier in the nzb file (if there is one)
2 - an md5 of the nzb file
I'd rather keep a list of id's rather than needing to keep a copy of all the nzb's i've downloaded (including there name) for privacy reasons.
Add a smarter dupe check
Re: Add a smarter dupe check
So you're not saving the downloads eitherlysp wrote:rather than needing to keep a copy of all the nzb's i've downloaded (including there name) for privacy reasons.
We're looking at an improved method, but it will take a while.
Re: Add a smarter dupe check
You have me there
However most of my files are automatically renamed to remove groups/etc and moved to another server.
I'm guessing like most people i'd prefer not to keep my complete download history in a single folder, but still have a check to make sure i (or my automation routines), don't get the same thing twice and waste bandwidth.
Had a quick follow up and saw that a nzb has a list of message-id's, but not a single one to identify that as a whole.
Possible implementing something like:
Hash(MessageID1 + MessageID2 + MessageID3 + MessageID4 ... MessageIDX) to identify a unique nzb.
Added benefit is it wont take up too much space either.
Another issue i found (which is kind of a bug), is that nzb's named the same name get dupe checked out even though the contents may be different.
Eg if a website standardises a download to "Cart.NZB", then that may be rejected. The above hash method would bypass this "bug".
However most of my files are automatically renamed to remove groups/etc and moved to another server.
I'm guessing like most people i'd prefer not to keep my complete download history in a single folder, but still have a check to make sure i (or my automation routines), don't get the same thing twice and waste bandwidth.
Had a quick follow up and saw that a nzb has a list of message-id's, but not a single one to identify that as a whole.
Possible implementing something like:
Hash(MessageID1 + MessageID2 + MessageID3 + MessageID4 ... MessageIDX) to identify a unique nzb.
Added benefit is it wont take up too much space either.
Another issue i found (which is kind of a bug), is that nzb's named the same name get dupe checked out even though the contents may be different.
Eg if a website standardises a download to "Cart.NZB", then that may be rejected. The above hash method would bypass this "bug".
Re: Add a smarter dupe check
"Duplicate" is not a well-defined term in this case.
Most don't want to download the same item twice, not even from different release groups.
So a name is a valid criterium and so is a content check.
But neither are very reliable.
Duplicate detection is mostly effective against the same items sneaking in due to
different RSS feeds carrying the same posts.
We are looking at detection based on the methods used by the Sorting functions.
But of course that's less useful for unique items.
Most don't want to download the same item twice, not even from different release groups.
So a name is a valid criterium and so is a content check.
But neither are very reliable.
Duplicate detection is mostly effective against the same items sneaking in due to
different RSS feeds carrying the same posts.
We are looking at detection based on the methods used by the Sorting functions.
But of course that's less useful for unique items.
Re: Add a smarter dupe check
That's probably taking it one step further.
Currently the dupe check works on release name, and most scene releases include the group-name as part of the release name.
So dupe checking by different groups currently will not work.
However changing it to a hash will keep the checking functionality/logic the same but without exposing the file/release names to people browsing the computer.
Currently the dupe check works on release name, and most scene releases include the group-name as part of the release name.
So dupe checking by different groups currently will not work.
However changing it to a hash will keep the checking functionality/logic the same but without exposing the file/release names to people browsing the computer.