[mnet-devel] about handling transient network outages
Zooko O'Whielacronx
zooko at zooko.com
Mon Mar 8 14:02:49 GMT 2004
> okay, it looks like osh got his MT working, so we should add him to
> bootpages.
Great!
> How can I decide which bootpage contains useful data?
I guess just eyeball them. I've appended the URLs and contents of the
bootpages to the end of this message.
> Where could bootpage owners get actual info then if they like to update?
I guess you have to send them e-mail and tell them what to change it to. :-)
Their e-mail addresses are all in the CREDITS file, with "D: bootpage op".
> Is there a way to avoid unchangeably hardcoded bootpages in a pack.
> Couldn't we load a page from mnet-webcvs from sf.net?
A bootpage-bootpage? But then, wouldn't we want to have multiple
bootpage-bootpages? And we wouldn't want to hardcode the list of
bootpage-bootpages into the source code, so instead we should have a
bootpage-bootpage-bootpage...
> Can we have timestamps in the bootpage infos so a broker could load all of
> them, and decide which one is the latest.
We can't rely on the clocks of the various bootpage servers being synchronized
with each other. *But* we do rely on the "sequence number" field inside the
contact infos (usually: see CommStrat.choose_best_strategy() for details), so
this is already solved.
> But before we need to enable brokers to load them. I am afraid we never
> really reach the point of loading them (except for fresh broker startup)
> since there is always one (even stale) MT reported by peerman, no?
Good point! The bootpage loading code was written at a time when I thought
that peerman would give you an empty list when you asked it for MTs and there
were no known MTs. Since then, we changed it so that peerman always gives you
at least one MT.
First I'll tell you what I think should happen as I read through peerman.py,
then I'll tell you what I observe in <nw31m>'s behavior for the last 24 hours.
Okay, peerman.py says that a bootpage is supposed to be loaded in the following
cases:
1. On startup.
2. Whenever you want to lookup a contact info, and there are no MTs known
(excluding those which are already involved in this lookup chain -- i.e. if
you want to find the contact info of an MT named <abcde>, and you decide to
ask an MT named <fghij>, then you don't try to ask <abcde> for the contact
info for <fghij>).
3. Whenever you want to do a list servers and there are no MTs known. (Note:
this never happens, since peerman always provides you with an MT, even if
it is stale.)
4. Whenever you try to do a list servers and it fails.
5. Every 24 hours.
6. One hour after it loaded the last bootpage URL in the list and needed to
reload the list. (Why? I don't know. I guess because without this it
would go 48 hours without loading every time it reached the end of the
list.)
I just looked at the log of <nw31m>, which has been running for about 19 hours.
It looks like #4 in the above list is working as intended -- whenever <nw31m>
tried to do a "list servers" and it failed, then <nw31m> loaded another
bootpage.
So even though the change in the semantics of peerman *has* broken #3 and
maybe partially broken #2, it has left #1, #4, #5, and #6 intact. How often do
you try to do a "list servers"? Well:
1. On startup.
2. Every 15 minutes.
3. Anytime that you tried to do a list servers and, for any reason, you didn't
get the satisfaction of at least one contact info. (This means you keep
trying and trying, including re-loading and re-loading bootpages, until you
get at least one contact info from a MetaTracker.)
> and we avoid to send a hello in many cases where a hello would be sensible.
> For instance: at startup, the first hello is scheduled quickly. The code
> decides not to send a hello since usually there is not yet a commstrat
> calculated for us.
> That is the slow startup bug, BTW,
I guess this happens when behind relay. It doesn't happen for <nw31m>. I just
now thought it through and what you wrote here (partially ellided) is exactly
right. This makes it so that other nodes which are behind relay can't send
messages to your node which is behind relay until 15 minutes after your node
started up.
I'll add a note to the "slow startup" issue tracker entry.
This *shouldn't* happen when your IP address changes, because you *should* keep
using the same relay server even after your IP address changes. So I don't
think that this can be involved in any "network disconnect after transient
network outage" problem, but I'm not 100% sure.
Good work on diagnosing bugs!
Regards,
Zooko
------- here comes bootpage http://web.nilpotent.org/bootpage.txt
BROKER_VERSION_STR: 0.6.2.305-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: [{'pyver': '2.3.3a0 2, Nov 20 2003, 07:51:52 GCC 3.3.2 Debian', 'sequence num': 51, 'broker version': '0.7.0.103-UNSTABLE', 'platform': 'linux2', 'services': [{'type': 'meta tracker'}], 'connection strategies': [{'lowerstrategy': {'IP address': '64.231.152.244', 'port number': '27087', 'comm strat sequence num': 7, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 7, 'comm strategy type': 'crypto'}]}, {'broker version': '0.7.0.102-UNSTABLE', 'pyver': '2.3.2 1, Nov 6 2003, 05:22:58 GCC 3.3.1 FreeBSD', 'sequence num': 4, 'connection strategies': [{'lowerstrategy': {'IP address': '12.17.163.70', 'port number': '16859', 'comm strat sequence num': 1, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': '4ejKj16Ok-k50E9U6QoSnLpNmUsqv1NEWYcsi94E16xlh3SvYB2Qp5E05__D2DRIPvQ3pdUpS6D4XEScdHIaE5UEHZAsvqDYtqSCKjuUcNawl4LCrRsp7nBej5r8EMq_gLaQcc0kBMEaoFv0OlailXNUHd6uF_DnnVaUaY6AqNs', 'public exponent': '3'}}, 'comm strat sequence num': 1, 'comm strategy type': 'crypto'}], 'platform': 'freebsd5'}]
------- here comes bootpage http://leitl.org/bootpage.txt
BROKER_VERSION_STR: 0.6.2.273-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: {'connection strategies': {'0': {'lowerstrategy': {'IP address': '64.231.179.226', 'port number': '27087', 'comm strat sequence num': 1, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 1, 'comm strategy type': 'crypto'}}}
------- here comes bootpage http://www.cryptnet.info/bootpage.html
BROKER_VERSION_STR: 0.6.2.279-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: {'connection strategies': {'0': {'lowerstrategy': {'IP address': '64.231.253.173', 'port number': '27087', 'comm strat sequence num': 5, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 5, 'comm strategy type': 'crypto'}}}
------- here comes bootpage http://www.boingy.org/~kyle/mnet/bootpage.txt
BROKER_VERSION_STR: 0.6.2.273-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: {'connection strategies': {'0': {'lowerstrategy': {'IP address': 'pion.zooko.com', 'port number': '27087', 'comm strat sequence num': 1, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 1, 'comm strategy type': 'crypto'}}}
------- here comes bootpage http://www.toehold.com/mnet/bootpage.txt
BROKER_VERSION_STR: 0.6.2.273-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: {'connection strategies': {'0': {'lowerstrategy': {'IP address': 'pion.zooko.com', 'port number': '27087', 'comm strat sequence num': 1, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 1, 'comm strategy type': 'crypto'}}}
------- here comes bootpage http://zooko.com/bootpage.txt
BROKER_VERSION_STR: 0.6.2.364-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: [{'pyver': '2.3.3 1, Mar 3 2004, 07:41:22 GCC 2.95.4 20011002 Debian prerelease', 'sequence num': 151, 'platform': 'linux2', 'node version': '0.7.0.118-UNSTABLE', 'services': [{'type': 'meta tracker'}], 'connection strategies': [{'lowerstrategy': {'IP address': '64.231.252.86', 'port number': '27087', 'comm strat sequence num': 28, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 28, 'comm strategy type': 'crypto'}]}]
------- here comes bootpage http://sspaeth.org/mnet/bootpage
BROKER_VERSION_STR: 0.6.2.273-STABLE
MULTI_ROOT_ID_TRACKER_CONTACT_INFO: {'connection strategies': {'0': {'lowerstrategy': {'IP address': '64.231.179.226', 'port number': '27087', 'comm strat sequence num': 1, 'comm strategy type': 'TCP'}, 'pubkey': {'key header': {'usage': 'only for communication security', 'type': 'public', 'cryptosystem': 'RSA'}, 'key values': {'public modulus': 'vWtFNl7axG0JDy2_wGE7nd3La6O6HmwvKzowVUlaposP72x92AnK2HyD8kXzZne2k0-rE2NbwdBxUEE8my1eivf2ue4so_KKrU5_phXJRyTgV0Kwv6kXaB2RVGs--FzXPyZISRrL0jquU3p9G-N7JVAqXUmQqHjmHIKEGF4mDiE', 'public exponent': '3'}}, 'comm strat sequence num': 1, 'comm strategy type': 'crypto'}}}
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel
More information about the Mnet-devel
mailing list