How to measure retention?

Feel free to talk about anything and everything in this board.
Post Reply
bluenote
Jr. Member
Jr. Member
Posts: 57
Joined: November 19th, 2010, 3:28 am

How to measure retention?

Post by bluenote »

What is the easiest way to measure retention?  I'm considering doing up a little chart of some usenet providers, + my own home provider
and I'm interested in measuring it myself rather than going with (possibly nonexistent/outdated) info published on websites.

thanks for any ideas!
User avatar
inpheaux
Administrator
Administrator
Posts: 563
Joined: January 16th, 2008, 9:14 pm

Re: How to measure retention?

Post by inpheaux »

See what the host claims their retention is, attempt to download something that old, then continue going either up or down in days until you find the edge.

Testing this isn't fun, but good luck!
User avatar
sander
Release Testers
Release Testers
Posts: 8811
Joined: January 22nd, 2008, 2:22 pm

Re: How to measure retention?

Post by sander »

I've been thinking about the same. Methods to do it:

Passively:
1) analyze SAB's log files; SAB knows the age of a post, and knows which servers don't have the post ("missing").
(BTW: I don't see positive mentions in the log file ... I don't know if you can turn that on. Even "+Debug" does not show positive hits.)

So ... analyze "missing" lines like:

Code: Select all

Thread [email protected]:119: Article [email protected] missing
Combined with the age of the post, you now have an indication that the retention of news.lightningusenet.com is lower than the age of the post. All SAB-users together (meaning: SAB itself, after user's consent) could post this anonymously, and we'll get a good overview.

Special attention should be paid to "missing from all servers": it could mean the post is older than the retention of all servers you have enabled, *or* the post is simply not there at all as in "never posted". In the last case, it should be left out from the retention determination.

@Shypike: is there a way to get positive hits into the log file (and not only "missing")? Or is it already there, and am I overlooking it? With positive hits, you get a better overview what's going on.


Or, actively:
2) post a 10MB file each day (or week), with a per day (or week)) unique identifier like "retention-20110401.bin". Then, each day (or week), try to retrieve old posts from different newsservers. Post the results.
Last edited by sander on April 1st, 2011, 2:16 am, edited 1 time in total.
Please don't send me unrequested PM's; the forum is the best way to communicate.
If someone helps you, please reply to that help.
f you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
sander
Release Testers
Release Testers
Posts: 8811
Joined: January 22nd, 2008, 2:22 pm

Re: How to measure retention?

Post by sander »

@shypike:

Question 1:
Can I put in a "positive" hit (so article successfully download) somewhere in the SAB source? I tried downloader.py: "logging.error('SJ-code')" around line 566 , but it prints a lot of line per article ...

Question 2:
the date of the post is not in the sabnzbd.log, or is it? Do I need to search the .NZB.gz in the nzb-backup-dir to find the date?
Please don't send me unrequested PM's; the forum is the best way to communicate.
If someone helps you, please reply to that help.
f you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: How to measure retention?

Post by shypike »

Q1: the decoder already logs positive hits or do you want the specific server?
Q2: That's right, the age isn't logged.

Don't go overboard with this.
You'll never get an exact figure anyway.
The actual deletion policy is very likely to be determined by available space
instead of the age of the article.
User avatar
sander
Release Testers
Release Testers
Posts: 8811
Joined: January 22nd, 2008, 2:22 pm

Re: How to measure retention?

Post by sander »

In the mean time I've written a proof of concept. It first parses the NZB-backup directory to store all article id's and their post dates in a hash. Then it parses the sabnzbd.log* files looking for "missing", and constructs the statistics based on that.

The result is here:

Code: Select all

sander@lifebook:~/.sabnzbd$ python retention-determination.py   | sort | uniq -c

   1242 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 217
     79 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 33
     36 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 836
     13 newsserver X retention probably less than Y days:  reader.ipv6.xsnews.nl 836

sander@lifebook:~/.sabnzbd$ 

So:
newszilla6's retention is something less than 33 days
ipv6.xsnews' retention is something less than 836 days

With more different download (each with a different age), there will be more specific retention info. And indeed that's probably not a fixed cut-off date, but it will give an indication of the retention. And even the completeness will be clear.
Last edited by sander on April 1st, 2011, 9:04 am, edited 1 time in total.
Please don't send me unrequested PM's; the forum is the best way to communicate.
If someone helps you, please reply to that help.
f you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
sander
Release Testers
Release Testers
Posts: 8811
Joined: January 22nd, 2008, 2:22 pm

Re: How to measure retention?

Post by sander »

shypike wrote: Q1: the decoder already logs positive hits or do you want the specific server?
How can I see a positive article hit? IMHO with a clean download, there's no article id mentioned at all in the log file?

I do see "oot.105.1/oot.105-diff.r02", but that's the file, not the article id, right?

EDIT:
The server name would be very nice, as it would be a positive signal about retention, but if not, I have to rely on "missing". I just want to make sure I'm not looking at a non-posted article, and blaming a specific newsserver for that.
Last edited by sander on April 1st, 2011, 9:16 am, edited 1 time in total.
Please don't send me unrequested PM's; the forum is the best way to communicate.
If someone helps you, please reply to that help.
f you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
sander
Release Testers
Release Testers
Posts: 8811
Joined: January 22nd, 2008, 2:22 pm

Re: How to measure retention?

Post by sander »

Interesting:

Code: Select all

sander@lifebook:~/.sabnzbd$ python retention-determination.py   | sort | uniq -c

      2 newsserver X retention probably less than Y days:  newsreader3.eweka.nl 710

   1242 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 217
     79 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 33
     11 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 541
    829 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 710
     36 newsserver X retention probably less than Y days:  newszilla6.xs4all.nl 836

     83 newsserver X retention probably less than Y days:  reader.ipv6.xsnews.nl 710
     14 newsserver X retention probably less than Y days:  reader.ipv6.xsnews.nl 836

sander@lifebook:~/.sabnzbd$ 
From the above, I think I can deduct that
1) I tried to download post with age: 33, 217, 541, 710 and 836 days (and maybe younger, but that's not logged)
2) newszilla6's retention is probably below 33 days

3) ipv6.xsnews apparantly can handle post of 33, 327 and 541 days (and not of 710 and 836 days). So retention is between 541 and 710 days

4) eweka has only 2 missing 710-day articles, and *no* missing 836-day articles. I would say retention is above 836 days ... Interesting.
Please don't send me unrequested PM's; the forum is the best way to communicate.
If someone helps you, please reply to that help.
f you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
User avatar
sander
Release Testers
Release Testers
Posts: 8811
Joined: January 22nd, 2008, 2:22 pm

Re: How to measure retention?

Post by sander »

bluenote wrote: What is the easiest way to measure retention?  I'm considering doing up a little chart of some usenet providers, + my own home provider
and I'm interested in measuring it myself rather than going with (possibly nonexistent/outdated) info published on websites.

thanks for any ideas!
@bluenote: Which Operating System do you use: Linux, Mac or Windows?
Please don't send me unrequested PM's; the forum is the best way to communicate.
If someone helps you, please reply to that help.
f you like our support, check our special newsserver deal or donate at: https://sabnzbd.org/donate
bluenote
Jr. Member
Jr. Member
Posts: 57
Joined: November 19th, 2010, 3:28 am

Re: How to measure retention?

Post by bluenote »

Hey sander

that's some impressive stuff you've got going there:)

I use windows though ..

thx
Post Reply