[mnet-devel] some questions and comments on NGrouting
Zooko
zooko at zooko.com
Tue Aug 12 16:07:42 BST 2003
To: freenet-dev and mnet-devel
[Note that replies sent to mnet-devel will be held by the mailing list software
pending my approval. I will approve *all* replies which are not about penis-
enlargement or Nigerian millions, and I'll also add the sender's e-mail to
the "automatically approved senders" list.]
Hi there!
I'm Zooko, coordinator of the <a href="http://mnet.sf.net/"> Mnet project </a>.
You might be interested in these sections of our FAQ:
http://mnet.sourceforge.net/faq.php#diff-Mnet-Freenet
http://mnet.sourceforge.net/faq.php#diff-Mnet-Freenet-2
Ian Clarke asked me to comment on the NGrouting scheme. We've had similar
problems, and we've developed similar but not identical solutions for Mnet.
I'm reading <a href="http://freenet.sourceforge.net/index.php?page=ngrouting">
this doc </a> and I have a few questions and comments.
Note: my motivation in writing this is *not* to proclaim that Mnet is better
than Freenet. It is to help both of our development efforts by learning from
the other. Nonetheless, the fact that I do direct comparisons of specific
techniques from each design, and that I point out potential problems in
Freenet's design, will probably trigger the "us vs. them" reflex in your
brain and engender strong emotions. Please be polite in your response, and
I'll do the same for you.
Some questions:
* What does "must be efficiently serializable" mean?
* How are queries routed *alternatively* in order to attempt to bypass a Data
Not Found ("DNF") event? (I'm especially interested in this question, as
I haven't figured out how I want to do it in Mnet yet. ;-))
* When do Freenet nodes establish links to newly discovered peers (discovered
through DataSource information)? Are Freenet nodes expected to use
NGrouting measurements to choose when to link to new peers?
Some comments:
Unfortunately Mnet's solution to this problem is not yet nicely documented the
way the NGrouting design is. However, Mnet's approach is simple enough that
I can relate it while discussing Freenet's.
It seems like there are two components of the algorithm which are separable
from one another, both in Mnet and in Freenet. One is how to collapse the
various kinds of measurements of performance -- latency, throughput, hit
rate (== 1-DNF rate), connection-failure-rate -- into a single scalar. The
other is how to combine multiple such scalars collected through history into a
single scalar prediction of which of your peers will likely perform best on
the query that you are about to send.
As to the first -- summarizing measurements into a single scalar -- Both Mnet
and Freenet do this, although it isn't the only option -- one could have a
more complicated system that keeps these measurements as separate scalars.
Mnet's current algorithm is simpler than Freenet's, and comparing them might
be useful. Mnet's algorithm is to store two numbers: the historical hit rate
and the historical turnaround time (the time elapsed between initiating the
request and getting the resulting data). If an Mnet node initiates a request
and it does not get the resulting data, then this has no effect on the
historical turnaround time, but it lowers the historical hit rate.
Now the way these two values are combined is the observation that the inverse
of latency is rate. That is, if X is the average number of seconds you wait
to get a block of data, then 1/X is the average number of blocks of data you
get per second. If Y is the average hit rate -- the number of requests which
yielded the correct data divided by the total number of requests -- then
(1/X) * Y is the average rate of blocks you received per request you sent.
(It was my wife, Amber Wilcox-O'Hearn, who pointed this out to me.)
So Mnet combines all kinds of failure (stored in hit rate) and performance
(stored in average turnaround time) into a single scalar: (1/latency) * hitrate.
As I said, this is simpler than Freenet's technique, which keeps track of DNFs
and connection failures separately, and attempts to heuristically separate
"false miss" DNFs -- where the data is present in Freenet, but the node didn't
find it -- from "true miss" DNFs -- where the data isn't present in Freenet.
One potential problem with the way it does that is that it rests on the
assumption that true DNFs -- queries for data that is absent from Freenet --
is evenly spread throughout the keyspace. I can imagine cases where that
assumption becomes untrue, even for extended periods of time or for
significant fractions of the keyspace.
Okay, on to the "history and combination" part, which is where NGrouting gets
*really* sophisticated.
If you haven't already read the NGrouting doc, you probably ought to before
reading this.
Mnet's history and combination is again relatively simple. The "history" part
is accomplished via an exponential fall-off filter:
historicalvalue := historicalvalue * 0.9 + newsample * 0.1
You can of course replace "0.9" and "0.1" with other constants of your choice,
provided that they sum to 1.0. It's actually slightly more complicated than
this, since we store variance as well as mean, but you get the idea.
This is similar to the *vertical* component of the NGrouting scheme, where "0.1"
is similar to the "forgetfulness" level.
The "combination" part is trivial in Mnet: the entire keyspace is summed up
into the single "latency" and "hitrate" scalars.
Obviously this can work only because the job of partitioning the key space is
separated from the which-peer-to-route-to question in Mnet. (That job will be
performed, in Mnet v0.7, by a separate mechanism as part of the DHT-like
routing.)
The two-dimensional approach Freenet uses is interesting, and as far as I can
tell it is a reasonable way to combine the keyspace-related performance
measurements that Freenet wants.
I'm eager to find out how it works in practice. Unfortunately I don't think
it will be easy to make any solid observations of how it works, especially the
sophisticated and complex bits like the influence of the minimal-DNF
subtraction and the two-dimensional history/combination technique. It will
require some really good simulations or really good self-monitoring mechanism
to provide a clear enough picture for us to judge the impact of those details,
in my opinion.
There is one funny thing in the two-dimensional history: the horizontal motion
of the ten points means that the impact of a new sample is different depending
on whether a point has already been dragged across that X position. I mean:
consider a new sample comes in at X=0.25, and two of the ten points need to be
updated. Well, if this is a pristine set where the first point is at X=0.1,
the second point is at X=0.2, and so on, then the second and third points are
the two that will be moved toward the new sample.
However, if a lot of samples have previously been registered at 0.21, which
have "dragged" the third point all the way over so that it is currently at
0.24, then this new sample at 0.25 will affect the third and *fourth* points.
This isn't necessarily bad, but it was a little surprising to me when
I realized it, and it is a way in which NGrouting is not scale-free. That is:
suppose there is a small Freenet where one node receives requests which are
evenly distributed across 0.4 of the keyspace. Suppose there is a large
Freenet where one node receives requests which are evenly distributed across
0.004 of the keyspace. The former will move 5 of the points after a few dozen
new samples have come in. The latter will move only 2.
Okay, that's all I have to say about the details of the algorithms.
Regards,
Zooko
http://zooko.com/
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel
More information about the Mnet-devel
mailing list