[mnet-devel] pre-print of Bram's p2p econ paper

Zooko zooko at zooko.com
Tue Mar 25 19:53:42 GMT 2003


Please don't forward as this is only a draft.  (RAH: this means you.)

I haven't read this paper yet.  It could be completely irrelevant for all I know.

--Z

------- Forwarded Message

Date: Tue, 25 Mar 2003 11:21:18 -0800 (PST)
From: Bram Cohen <bram at bitconjurer.org>
To: zooko at zooko.com
Subject: p2pecon draft 2

Abstract

The BitTorrent file distribution system uses tit-for-tat as a method of
seeking pareto efficiency. It achieves a higher level of robustness and
resource utilization than any currently known cooperative technique. We
explain what BitTorrent does, and how economic methods are used to achieve
that goal.

1. what BitTorrent does

When a file is made available using HTTP, all upload cost is placed on the
hosting machine. With BitTorrent, when multiple people are downloading the
same file at the same time, they upload pieces of the file to each
other. This redistributes the cost of upload to downloaders, (where it is
often not even metered), thus making hosting a file with a potentially
unlimited number of downloaders affordable.

BitTorrent's interface is almost the simplest possible. Users launch it by
clicking on a hyperlink to the file they wish to download, and are given a
standard "Save As" dialog, followed by a download progress dialog which is
mostly notable for having an upload rate in addition to a download
rate. This extreme ease of use has contributed greatly to BitTorrent's
adoption, and may even be more important than, although it certainly
complements, the performance and cost redistribution features which are
described in this paper.

The idea is not new. It has not been previously deployed on a large scale
because the logistical and robustness problems are quite difficult. Simply
figuring out which peers have what parts of the file and where they should
be sent is difficult to do without incurring a huge overhead. Adding to
that problem is the huge churn rates experienced in real
deployments. Peers are rarely up for more than a few hours, and frequently
only join for a few minutes. Finally, there is a general problem of
fairness. The amortized download rate must, of mathematical necessity, be
equal to the amortized upload rate. The strategy for allocating upload
which motivates peers to join the network best is to make each peer's
download rate be proportional to their upload rate. In practice it's very
difficult to keep peer download rates from sometimes dropping to zero by
chance, much less make upload and download rates be correlated. We will
explain how BitTorrent solves all of these problems well.

2. technical framework

To start a BitTorrent deployment, a static file with the extension
.torrent is put on an ordinary web server. The .torrent contains
information about the file, its length, name, and hashing information, and
the url of a tracker. Trackers speak a very simple protocol layered on top
of HTTP in which a downloader sends information about what file it's
downloading, what port it's listening on, and similar information, and the
tracker responds with a list of contact information for peers which are
downloading the same file. Downloaders then use this information to
connect to each other. To make a file available, a 'downloader' which
happens to have the complete file already, known as a seed, must be
started. The bandwidth requirements of the tracker and web server are very
low, while the seed must send out at least one complete copy of the
original file.

All logistical problems of file downloading are handled in the
interactions between peers. Some information about upload and download
rates is sent to the tracker, but that's just for statistics
gathering. The tracker's responsibilities are strictly limited to helping
peers find each other.

Despite trackers being the only way for peers to find each other, and the
only point of coordination at all, the standard algorithm for the tracker
is to return a random list of peers. Random graphs have very good
robustness properties. Many peer selection algorithms result in a power
law graph, which can get segmented after only a small amount of churn.

In order to keep track of which peers have what, BitTorrent cuts files
into pieces of fixed size, typically a quarter megabyte. Each downloader
reports to all of its peers what pieces it has. To verify data integrity,
the SHA1 hashes of all the pieces are included in the .torrent file, and
peers don't report that they have a piece until they've checked the
hash. Erasure codes have been suggested as a technique which might help
with file distribution, but this much simpler approach has proven to be
workable.

Peers continuously download pieces from all peers which they can. They of
course cannot download from peers they aren't connected to, and sometimes
peers don't have any pieces they want or won't currently let them
download. Strategies for peers disallowing downloading from them, known as
choking, will be discussed later. Other suggested approaches to file
distribution have generally involved a tree structure, which doesn't
utilize the upload capacity of leaves. Simply having peers announce what
they have results in less than a tenth of a percent bandwidth overhead and
reliably utilizes all available upload capacity.

When transferring data over TCP, like BitTorrent does, it is very
important to always have several requests pending at once, to avoid a
delay between pieces being sent, which is disastrous for transfer
rates. BitTorrent facilitates this by breaking pieces further into
sub-pieces over the wire, typically sixteen kilobytes in size, and always
keeping some number, typically ten, requests pipelined at once. Every time
a sub-piece arrives a new request is sent. The amount of data to pipeline
has been selected as a value which can reliably saturate most connections.

Selecting pieces to download in a good order is very important for good
performance. A poor piece selection algorithm can result in having all the
pieces which are currently on offer or, on the flip side, not having any
pieces to upload to peers you wish to.

BitTorrent's first policy for piece selection is that once a single
sub-piece has been requested, the remaining sub-pieces from that
particular piece are requested before sub-pieces from any other
piece. This does a good job of getting complete pieces as quickly as
possible.

When selecting which piece to start downloading next, peers generally
download pieces which the fewest of their own peers have first, a
technique we refer to as 'rarest first'. This heuristic does a good job of
making sure that peers have pieces which all of their peers want, so
uploading can be done when wanted. It also makes sure that pieces which
are more common are left for later, so the likelihood that a peer which
currently is offering upload will later not have anything of interest is
reduced.

An exception to rarest first is when downloading first begins. At that
time, the peer has nothing to upload, so it's important to get a complete
piece as quickly as possible. Rare pieces are generally only present on
one peer, so they would be downloaded slower than pieces which are present
on multiple peers for which it's possible to download sub-pieces from
different places. For this reason, pieces to download are selected at
random until the first complete piece is assembled, and then the strategy
changes to rarest first.

Information theory dictates that no downloaders can complete until every
part of the file has been uploaded by the seed. For deployments with a
single seed whose upload capacity is considerably less than that of many
downloaders, performance is much better if different downloaders get
different pieces from the seed, since redundant downloads waste the
opportunity for the seed to get more information out. Rarest first does a
good job of only downloading new pieces from the seed, since downloaders
will be able to see that their other peers have pieces the seed has
uploaded already.

For some deployments the original seed is eventually taken down for cost
reasons, leaving only current downloaders to upload. This leads to a very
significant risk of a particular piece no longer being available from any
current downloaders. Rarest first again handles this well, by replicating
the rarest pieces as quickly as possible thus reducing the risk of them
getting completely lost as current peers stop uploading.

Sometimes a piece will be requested from a peer with very slow transfer
rates. This isn't a problem in the middle of a download, but could
potentially delay a download's finish. To keep that from happening, once
all sub-pieces which a peer doesn't have are actively being requested it
sends requests for all sub-pieces to all peers. Cancels are sent for
sub-pieces which arrive to keep too much bandwidth from being wasted on
redundant sends. In practice not much bandwidth is wasted this way, and
the end of a file is always downloaded quickly.

3. choking algorithms

The decision of which peers to unchoke at any given time, known as a
choking algorithm, isn't technically part of the BitTorrent wire protocol,
but is necessary for good performance. A good choking algorithm should
utilize all available resources, provide reasonably consistent download
rates for everyone, and be somewhat resistant to peers only downloading
and not uploading.

Widely known economic theories show that systems which are pareto
efficient, meaning that no two counterparties can make an exchange and
both be happier, tend to have all of the above properties. In computer
science terms, seeking pareto efficiency is a local optimization algorithm
in which pairs of counterparties see if they can improve their lot
together, and such algorithms tend to lead to global optima. Specifically,
if two peers are both getting poor reciprocation for some of the upload
they are providing, they can often start uploading to each other instead
and both get a better download rate than they had before.

BitTorrent's choking algorithms attempt to achieve pareto efficiency using
a more fleshed out version of tit-for-tat than that used to play
prisoner's dilemma. Peers reciprocate uploading to peers which upload to
them, with the goal of at any time of having several connections which are
actively transferring in both directions. Unutilized connections are also
uploaded to on a trial basis to see if better transfer rates could be
found using them.

On a technical level, each BitTorrent peer always unchokes a fixed number
of other peers (default is four), so the issue becomes which peers to
unchoke. This approach allows TCP's built-in congestion control to
reliably saturate upload capacity.

Decisions as to which peers to unchoke are based strictly on current
download rate. Calculating current download rate meaningfully is a
surprisingly difficult problem; the current implementation essentially
uses a rolling 20-second average. Former implementations used information
about long-term net transfer amounts, but that performed poorly because
the value of bandwidth shifts rapidly over time as resources become
available and go away.

To avoid situations in which resources are wasted by rapidly choking and
unchoking peers, BitTorrent peers recalculate who they want to choke once
every ten seconds, and then leave the situation as is until the next ten
second period is up. Ten seconds is a long enough period of time for new
transfers to ramp up to their full capacity.

Simply uploading to the peers which provide the best download rate would
suffer from having no method of discovering if currently unused
connections are better than the ones being used. To fix this, at all times
a BitTorrent peer has a single 'optimistic unchoke', which is unchoked
regardless of the current download rate from it. Which peer is the
optimistic unchoke is rotated every third rechoke period (30 seconds). 30
seconds is enough time for the upload to get to full capacity, the
download to reciprocate, and the download to get to full capacity. The
analogy with tit-for-tat here is quite remarkable; Optimistic unchokes
correspond very strongly to always cooperating on the first move in
prisoner's dilemma.

Occasionally a BitTorrent peer will be choked by all peers which it was
formerly downloading from. In such cases it will usually continue to get
poor download rates until the optimistic unchoke finds better peers. To
mitigate this problem, when over a minute goes by without getting a single
piece from a particular peer, BitTorrent assumes it is 'snubbed' by that
peer and doesn't upload to it except as an optimistic unchoke. This
frequently results in more than one concurrent optimistic unchoke, (an
exception to the exactly one optimistic unchoke rule mentioned above),
which causes download rates to recover much more quickly when they falter.

Once a peer is done done downloading, it no longer has useful download
rates to decide which peers to upload to. The current implementation then
switches to preferring peers which it has better upload rates to, which
does a decent job of utilizing all available upload capacity and
preferring peers which noone else happens to be uploading to at the
moment.

4. real world experience

BitTorrent not only is already implemented, but is already widely
deployed. It routinely serves files hundreds of megabytes in size to
hundreds of concurrent downloaders. The largest known deployments have
been on the order of a thousand simultaneous downloaders. The current
scaling bottleneck (which hasn't actually been reached) appears to be the
bandwidth overhead of the tracker. Currently that's about a thousandth the
total amount of bandwidth used, and some minor protocol extensions will
probably get it down to a ten thousandth.






------- End of Forwarded Message



-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel




More information about the Mnet-devel mailing list