Tuesday, November 14, 2017

TLS close-notify .... what were they thinking?

Close-Notify Idiocy?

TLS (and presumably SSL) require that implementations send a special disconnect message, "close-notify", when closing a connection.  The precise language (from TLS v1.2) reads:

The client and the server must share knowledge that the connection is
ending in order to avoid a truncation attack. Either party may
initiate the exchange of closing messages. 
This message notifies the recipient that the sender will not send
any more messages on this connection. Note that as of TLS 1.1,
failure to properly close a connection no longer requires that a
session not be resumed. This is a change from TLS 1.0 to conform
with widespread implementation practice. 
Either party may initiate a close by sending a close_notify alert.
Any data received after a closure alert is ignored. 
Unless some other fatal alert has been transmitted, each party is
required to send a close_notify alert before closing the write side
of the connection. The other party MUST respond with a close_notify
alert of its own and close down the connection immediately,
discarding any pending writes. It is not required for the initiator
of the close to wait for the responding close_notify alert before
closing the read side of the connection.

This has to be one of the stupider designs I've seen.

The stated reason for this is to prevent a "truncation attack", where an attacker terminates the session by sending a clear-text disconnect (TCP FIN) message, presumably just before you log out of some sensitive service, say GMail.

The stupid thing here is that this is for WebApps that want to send a logout, and don't want to wait for confirmation that logout had occurred before sending confirmation to the user.  So this logout is unlike every other RPC.  What...?!?

Practical Exploit?

It's not even clear how one would use this attack to compromise a system... an attacker won't be able to hijack the actual TLS session unless they already pwned your encryption.  (In which case, game over, no need for truncation attacks.)  The idea in the truncation attack is that one side (the server?) still thinks the connection is alive, while the other (the browser?) thinks it is closed.  I guess this could be used to cause extra resource leaks on the server... but that's what keep-alives are for, right?

Bugs Everywhere

Of course, close-notify is the source of many bugs (pretty much none of them security critical) in TLS implementations.  Go ahead, Google... I'll wait...  Java, Microsoft, and many others have struggled in implementing this part of the RFC.

Even the TLS v1.1 authors recognized that "widespread implementation practice" is simply to ignore this part of the specification and close the TCP channel.

So you may be asking yourself, why don't implementations send the close-notify ... after all sending a single message seems pretty straight-forward and simple, right?

Semantic Overreach

Well, the thing is that on many occasions, the application is closing down.  Historically, operating systems would just close() their file descriptors on exit().  Even for long running applications, the quick way to abort a connection is ... close().  With no notification.  Application developers expect that close() is a non-blocking operation on network connections (and most everywhere else)1.

Guess what, you now cannot exit your application without sending this, without breaking the RFC.   That's right, this RFC changes the semantic of exit(2).  Whoa.

That's a little presumptive, dontcha think?

Requiring implementations to send this message means that now close() grows some kind of new semantic, where the application has to stop and wait for this to be delivered.  Which means TCP has to be flowing and healthy.  The only other RFC compliant behavior is to block and wait for it flow.

What happens if the other side is stuck, and doesn't read, leading to a TCP flow control condition?  You can't send the message, because the kernel TCP code won't accept it -- write() would block, and if you're in a non-blocking or event driven model, the event will simply never occur.  Your close() now blocks forever.

Defensively, you must insert a timeout somehow -- in violation of the RFC.  Otherwise your TCP session could block forever.  And now you have to contemplate how long to hold the channel open?  You've already decided (for whatever other reason) to abort the session, but you now have to wait a while ... how long is too long?  And meanwhile this open TCP sits around consuming buffer space, an open file descriptor, and perhaps other resources....

A Bit of Sanity

The sensible course of action, treating a connection abort for any reason as an implicit close notification, was simply "not considered" from what I can tell.

In my own application protocols, when using TLS, I may violate this RFC with prejudice. But then I also am not doing stupid things in the protocol like TCP connection reuse.  If you close the connection, all application state with that connection goes away.  Period.  Kind of ... logical, right?

Standards bodies be damned.

1. The exception here is historical tape devices, which might actually perform operations like rewinding the tape automatically upon close(). I think this semantic is probably lost in the mists of time for most of us.

No comments: