[mnet-devel] idea for download strategy
Zooko
zooko at zooko.com
Fri Mar 28 01:38:21 GMT 2003
This isn't fully-formed. It draws from several sources: old download code (Jim,
Bram Cohen, Greg Smith), BlockWrangler/GSR (me, Hauke, Myers),
BlockWrangler/KISS (Luke), and general IRC chitchat (et al.). It's similar to
current BlockWrangler/GSR.
So you want to download a file. You'll do the following thing over and over
(*when* exactly to do it is part of the issue, but we'll worry about that
later).
First, if some blocks have been located, download the best one. (Which is the
best? We'll worry about that later.)
Second, if you don't have enough currently-outstanding "do you have blocks"
requests for discovering block locations, then choose the block that you most
want to locate (which is that? We'll worry about it later.), and pick the best
blockserver for that block (how? We'll worry about that later.), and send a "do
you have blocks" message to that blockserver. It's a waste to send a "do you
have blocks" message with only a single blockId in it, so fill it out with the
best 32 blockIds for that blockserver. (How do you choose the best blockIds for
a blockserver? We'll worry about that later.)
Okay, I think that's it. This is what BlockWrangler currently does, and all the
"worry about it later" parts are supposed to be delegated to a
BlockWranglingStrategy. A particularly subtle one is "when do we act and when
do we wait?".
BlockWrangler has lots of reality-check assertions to make sure that nothing
silly happens, such as sending a "do you have blocks" request for a block that
you have already downloaded.
(It even has an assertion like "You idiot! You want block XYZ, and server ABC
has said that he has it, so why are you sitting there and not sending 'request
block' messages to anyone right now?".)
So, we should probably use the same sort of design for the new download system.
I haven't looked at the code, but I presume this is somewhat close to what Myers
has already implemented.
Okay, all of this was just preface for the 'strategy' part. Here is a proposed
strategy. Like I say, it isn't very well thought-out.
The first question is when to act and when not. But I don't want to think about
it right now. Let's assume that this is pretty much independent of the other
questions.
The next question is, which block is best to download among located blocks?
This is probably the same as the question of which block is best to search for
among unlocated blocks. Possible answers include: (a) the block that is on the
fastest blockserver, (b) the block that is earliest in the stream (for
incremental download/incremental display), (c) the block that is least widely
replicated (for robustness -- in case the last copy is about to go off-line).
Well, (c) is not really known to us until we've gotten lots of "dyhb" responses
about that block. We don't want to wait to get lots of "dyhb" responses before
we start downloading, so I don't see how to do (c) effectively, so I'll ignore
it.
We don't *currently* do incremental anything, so I'll ignore (b). There, that
simplifies things. So I search for blocks, and download blocks, from the
fastest blockserver. The definition of "fastest" is actually pretty complicated
though! For downloading blocks, it is simple, the fastest block server is the
one that has the highest "download block rate", where "download block rate" is
defined as the average turnaround time on "request block" requests times the
average success rate of "request block" requests.
For searching for blocks, it is more complicated. The "fastest" server is the
one with the highest "locate and download block rate", where "locate and
download block rate" is defined as the "locate and download block turnaround
time" times the "locate and download block success rate". The former is the
turnaround time between sending a "do you have blocks" and receiving a "request
block response". That is: in order to download a block from a server, you have
to send it a "do you have blocks", receive a "do you have blocks response", then
send it a "request block", then receive a "request block response". The "locate
and download block turnaround time" is the time for that entire 4-message
sequence to complete.
The "locate and download block success rate" is the ratio of blocks that got
from that server to blocks that you wanted from that server. If you decide that
you don't want the block then it doesn't count in the "locate and download block
success rate". (This can happen because you got the block from someone else or
because you completed the file using alternate shares. But if the user cancels
a download then we *should* count against everyone that we are currently trying
to get blocks from, since the user might have cancelled out of impatience.) If
the server says it doesn't have the block, or says that it does but then never
delivers it, then this counts against this success rate.
Okay, this sounds fine so far, but now I'm not sure about this:
Suppose I send "dhyb" queries over and over to the same blockserver asking about
the same blocks. Maybe I just haven't received its reply yet! So I should have
a constraint (ideally enforced by the BlockWrangler-type enforcer) that you
don't send a dyhb query to a server about a block that is currently an
outstanding dyhb request to that server ("outstanding" meaning hasn't passed its
soft time-out).
Okay, I'll stop for now. Any holes in this so far?
Oh -- there's a huge issue that I forgot to mention above. You want to use the
XOR metric to identify which blockservers are most likely to have which blocks
before you query them. How do you combine that with motivation (a): to use the
fastest blockserver? My current idea is sort of sloppy: use the sum-of-squares-
of-badness, i.e., the square of the XOR distance plus the square of the inverse
of the "locate and download blocks rate". This is what MojoHandicapper has done
since days of yore.
Regards,
Zooko
http://zooko.com/
^-- under re-construction: some new stuff, some broken links
-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel
More information about the Mnet-devel
mailing list