[Mnet-devel] EGTP status / problems

baka at baka.baka baka at baka.baka
Sun Aug 8 13:46:49 BST 2004


in 0.7.1 several changes got applied to EGTP, and there will be more.

this broke compatibility with old/ancient EGTP nodes for the first time,
which will no doubt make the gazillions of active EGTP users trying
to interact with mnet 0.7.1 nodes unhappy. oh, wait, there arent any?
(so i guess it doesnt really matter that much. ;)

this also completed the EGTP / MNET separation which was one of the
goals of the 0.7 branch. its now possible to run an EGTP node
without ANY of the code in the mnetlib-dir. 
(see egtp/scripts/ for an example)

some of the reasons/goals/ideas/concepts for this are ...
- egtp doesnt use bsddb anymore (removed by zooko)
- egtp doesnt use a config-file anymore
- egtp works standalone (major shuffling of code)
- egtp contains the metatracker-server and -client
- egtp contains the relayserver-server and -client
- egtp depends on Twisted now (the bootpageloader. *sigh*)

there are some problems with parts of this. egtp still wants to
put its keypair and some stats somewhere. this may be a bug or 
a feature, depending on what you want it to do. 

why a bug? longer story. 
when the bssdb code was removed, this removed the persistence of
egtp-crypto-sessions. i assume (but am not 100% sure) this persistence
was there to avoid "frequent" public-key-crypto re-handshaking since it
this was considered to be cpu-expensive in the last century.
but removing the persistence ("it will just create a session-reset and
the nodes will re-handshake") uncovered a there-all-the-time EGTP
problem: the session-deadlock. longer substory.

the symmetric-crypto EGTP messages use a session-ID to tell the
recipient which key to use for decryption. if the recipient doesnt
know the session-ID, it will try to send a "session reset" message
asking the sender to re-negotiate a crypto-session.

the session-reset works only for egtp-messages received via a
TCP-commstrat. it wont work (and never did) for messages received via
a relay. repeat after me: an EGTP node will NEVER be able to reset on a
unknown-session-message it received from a relay-server.
this is just a problem for nodes using a relay-client to poll a
relay-server. but this also explains why some RC-nodes have gone
silent in the past, whenever they ran into some bsddb problems with
their session-DB. 

with the bsddb-backed persistent sessions it was both less-likely to
run into this problem, and more fatal when it happened. because if it
happens, the deadlock will be persistent as well.

with the nonpersistent sessions, it happens all the time (whenever
a node gets restarted), but it can recover by restarting all involved
nodes at the same time. (thats why i never noticed it on my 
test-networks, where all 8 nodes get restarted in-sync by a script)

so, how can this be fixed ... several possible ways.

1) drop relay support. this was already done by someone who never
imagined a 0.7 node might get used as a client. 
but i like the relay, its one of the few remaining interesting
features mnet has. 
its also the base some of the even more interesting features use.
to treat that as a "solution" would also require to add a 
"supports session reset" to all potential future commstrats.
so, perhaps not this "solution".

2) drop keypair persistence. i havent thought about this one much
yet, it only came to mind when i realized that EGTP _still_ wants
to save something to disk even if not using a MNET-config.
the only thing this would break that comes to mind immediately 
is the response-times stuff, and thats not exactly working or
important anyways. dunno. i guess i will not treat this as a 
solution either, but make the keypair-persistence (and everything
else persistent in EGTP) optional anyways. sounds like a fun
feature at least. :)

3) drop session for _outgoing_ use after not _receiving_ a message
with it for $time (defaulting to 1h or so). this will result in
*gasp* expensive PK operations from time to time. but it will also
fix any comms deadlock after $time, where 1h might sound like a
long outage, but currently the duration for a comms-outage (if it
happens) is "until it gets cold in the warm place down there", so
one hour would be a huge improvement. (did i mention that there is
an active deployed 0.7.1 node on the public testnet with >50d uptime?)
this may not be the most scientific "solution", but a pretty simple
and reliable one. _unless_ there is more to the "stick to your sessions"
than "PK-OPS are expensive". zooko?

4) add something to session negotiation that will allow nodes to
avoid the asymmetric-sessions situation on crossover-handshake.

(uhm, ok. lemme explain: if the EGTP node gets a "lets establish
 a session" request from a peer it already has a _outgoing_ session
 for, it assumes the "establish session" messages passed each other
 on the network and will keep using its own outbound session, but
 accept the received session for receiving. a "the other node restarted
 and forgot about our established session" looks the same to the
 non-restarted node. so it will keep _sending_ with its own outgoing
 session and in case the MSGs arrive at the other node via a non-reset
 commstrat never get told about that being a bad idea.)

(resumed 4) so, simply add a timestamp to the "establish session" MSG,
always use the session with the higher stamp. (what if the stamps are
equal? uhm, use the one generated by the node with the higher ID.
(what if the nodeIDs are equal as well? raise ItsYourselfYouFool!))

so, this (4) thing raises some interesting opportunities in subtle
implementational details.

4a) a paranoid EGTP would accept only session-setup MSGS that are not
too-old (windowsize option, CONFIGURABLE PARANOIALEVEL!) (and not from
the future. aeh.) but that would add a "have a good system clock" to the
requirements for running a node. (while i dont understand how that is a 
problem, i had recently to find out that there are a lot of _servers_
with bad clocks out there on this internet thing, resulting in
"interesting" effects on anticentral NNTP-style flooding databases with
lowest-timestamp deduping)

4b) a not-so-paranoid EGTP would just use it to scratch dead sessions,
and in those cases where bad system clocks prevent it from doing that
job, the (3) mechanism will still do it.

4c) doesnt the (4a) version even offer limited protection against the
oh-so-scary replay attacks? uh-uh. dont want to think about it. just
want a working EGTP right now.

...

whatever. i have been rolling this problem in my head for some weeks
now, 3+4 is the best i came up with yet, and 2 just for the fun of it.
unless someone (zooko?) points out some misunderstanding on my part
about the session-persistence, or someone comes up with a WORKING
replacement for the relayer mechanism, i will start working in the
3+4(+2) direction soonish.

and its too warm here. anything except BRIEF replies that are NOT
trying to drown me in whitepaper-references will be ignored. 
mentioning EGTPv2_Architecture.txt will get you some ICBMs before 
being ignored.

in case my description of the problem raises questions, it might be
easiest to ask those in irc://irc.freenode.net/mnet

*melt*



More information about the Mnet-devel mailing list