There are a lot of design alternatives possible to TCP within the "create a reliable stream of data on top of an unreliable datagram layer" space:
• Full-duplex connections are probably a good idea, but certainly are not the only way, or the most obvious way, to create a reliable stream of data on top of an unreliable datagram layer. TCP's predecessor NCP was half-duplex.
• TCP itself also supports a half-duplex mode—even if one end sends FIN, the other end can keep transmitting as long as it wants. This was probably also a good idea, but it's certainly not the only obvious choice.
• Sequence numbers on messages or on bytes?
• Wouldn't it be useful to expose message boundaries to applications, the way 9P, SCTP, and some SNA protocols do?
• If you expose message boundaries to applications, maybe you'd also want to include a message type field? Protocol-level message-type fields have been found to be very useful in Ethernet and IP, and in a sense the port-number field in UDP is also a message-type field.
• Do you really need urgent data?
• Do servers need different port numbers? TCPMUX is a straightforward way of giving your servers port names, like in CHAOSNET, instead of port numbers. It only creates extra overhead at connection-opening time, assuming you have the moral equivalent of file descriptor passing on your OS. The only limitation is that you have to use different client ports for multiple simultaneous connections to the same server host. But in TCP everyone uses different client ports for different connections anyway. TCPMUX itself incurs an extra round-trip time delay for connection establishment, because the requested server name can't be transmitted until the client's ACK packet, but if you incorporated it into TCP, you'd put the server name in the SYN packet. If you eliminate the server port number in every TCP header, you can expand the client port number to 24 or even 32 bits.
• Alternatively, maybe network addresses should be assigned to server processes, as in Appletalk (or IP-based virtual hosting before HTTP/1.1's Host: header, or, for TLS, before SNI became widespread), rather than assigning network addresses to hosts and requiring port numbers or TCPMUX to distinguish multiple servers on the same host?
• Probably SACK was actually a good idea and should have always been the default? SACK gets a lot easier if you ack message numbers instead of byte numbers.
• Why is acknowledgement reneging allowed in TCP? That was a terrible idea.
• It turns out that measuring round-trip time is really important for retransmission, and TCP has no way of measuring RTT on retransmitted packets, which can pose real problems for correcting a ridiculously low RTT estimate, which results in excessive retransmission.
• Do you really need a PUSH bit? C'mon.
• A modest amount of overhead in the form of erasure-coding bits would permit recovery from modest amounts of packet loss without incurring retransmission timeouts, which is especially useful if your TCP-layer protocol requires a modest amount of packet loss for congestion control, as TCP does.
• Also you could use a "congestion experienced" bit instead of packet loss to detect congestion in the usual case. (TCP did eventually acquire CWR and ECE, but not for many years.)
• The fact that you can't resume a TCP connection from a different IP address, the way you can with a Mosh connection, is a serious flaw that seriously impedes nodes from moving around the network.
• TCP's hardcoded timeout of 5 minutes is also a major flaw. Wouldn't it be better if the application could set that to 1 hour, 90 minutes, 12 hours, or a week, to handle intermittent connectivity, such as with communication satellites? Similarly for very-long-latency datagrams, such as those relayed by single LEO satellites. Together this and the previous flaw have resulted in TCP largely being replaced for its original session-management purpose with new ad-hoc protocols such as HTTP magic cookies, protocols which use TCP, if at all, merely as a reliable datagram protocol.
• Initial sequence numbers turn out not to be a very good defense against IP spoofing, because that wasn't their original purpose. Their original purpose was preventing the erroneous reception of leftover TCP segments from a previous incarnation of the connection that have been bouncing around routers ever since; this purpose would be better served by using a different client port number for each new connection. The ISN namespace is far too small for current LFNs anyway, so we had to patch over the hole in TCP with timestamps and PAWS.
I just want to say that it's refreshing to stumble onto someone commenting in the same style that I do. Where most people see things that are good enough, hard to fix or innovative, I see things for their fatal flaws, how they should have been done right from the start and why they are obvious. So I'll just add my list of gripes about TCP that in many ways ruined the internet for decades, and maybe still do:
- TCP should have been a reliability layer above UDB, not beside it (made P2P harder than it should be, mainly burdening teleconferencing and video games)
- Window size field bytes should have been arbitrary length
- Checksum size field bytes should have been arbitrary length and the algorithm should have been optionally customizable
- Ports should have been unique binary strings of arbitrary length instead of numbers, and not limited in count (as mentioned)
- Streams should have been encrypted by default, with clear transmission as the special case (symmetric key encryption was invented before TCP)
- IP should have connected to an arbitrary peer ID, not a MAC address, for resumable sessions if network changes (maybe only securable with encryption)
- Encrypted streams should not have been on a special port for HTTPS (not TCP's fault)
- IP address field bytes should have been arbitrary length (not TCP's fault)
- File descriptors could have been universal instead of using network sockets, unix sockets, files, pipes and bind/listen/accept/select (not TCP's fault)
- Streams don't actually make sense in the first place, we needed state transfer with arbitrary datagram size and partial sends/ranges (not TCP's fault)
Linking this to my "why your tunnel won't work" checklist:
AppleTalk didn't get much love for its broadcast (or possibly multicast?) based service discovery protocol - but of course that is what inspired mDNS. I believe AppleTalk's LAN addresses were always dynamic (like 169.x IP addresses), simplifying administration and deployment.
I tend to think that one of the reasons linux containers are needed for network services is that DNS traditionally only returns an IP address (rather than address + port) so each service process needs to have its own IP address, which in linux requires a container or at least a network namespace.
AppleTalk also supported a reliable transaction (basically request-response RPC) protocol (ATP) and a session protocol, which I believe were used for Mac network services (printing, file servers, etc.) Certainly easier than serializing/deserializing byte streams.
Does "session protocol" mean that it provided packet retransmission and reordering, like TCP? How does that save you serializing and deserializing byte streams?
I agree that, given the existing design of IP and TCP, you could get much of the benefit of first-class addresses for services by using, for example, DNS-SD, and that is what ZeroConf does. (It is not a coincidence that the DNS-SD RFC was written by a couple of Apple employees.) But, if that's the way you're going to be finding endpoints to initiate connections to, there's no benefit to having separate port numbers and IP addresses. And IP addresses are far scarcer than just requiring a Linux container or a network namespace: there are only 2³² of them. But it is rare to find an IP address that is listening on more than 64 of its 2¹⁶ TCP ports, so in an alternate history where you moved those 16 bits from the port number to the IP address, we would have one thousandth of the IP-address crunch that we do.
Historically, possibly the reason that it wasn't done this way is that port numbers predated the DNS by about 10 years.
Mockapetris's DNS RFCs are from 01983, although I think I've talked to people who installed DNS a year or two before that. Port numbers were first proposed in RFC 38 in 01970 https://datatracker.ietf.org/doc/html/rfc38
> The END and RDY must specify relevant sockets in addition to the link number. Only the local socket name need be supplied
> Connections are named by a pair of sockets. Sockets are 40 bit names
which are known throughout the network. Each host is assigned a
private subset of these names, and a command which requests a
connection names one socket which is local to the requesting host and
one local to the receiver of the request.
> Sockets are polarized; even numbered sockets are receive sockets; odd numbered ones are send sockets. One of each is required to make a connection.
In RFC 129 in 01971 we see discussion about whether socketnames should include host numbers and/or user numbers, still with the low-order bit indicating the socket's gender (emissive or receptive). https://datatracker.ietf.org/doc/html/rfc129
RFC 147 later that year https://datatracker.ietf.org/doc/html/rfc147 discusses within-machine port numbers and how they should or should not relate to the socketnames transmitted in NCP packets:
> Previous network papers postulated that a process running under control of
the host's operating system would have access to a number of ports. A port
might be a physical input or output device, or a logical I/O device (...)
> A socket has been defined to be the identification of a port for machine to
machine communication through the ARPA network. Sockets allocated to each
host must be uniquely associated with a known process or be undefined. The
name of some sockets must be universally known and associated with a known
process operating with a specified protocol. (e.g., a logger socket, RJE
socket, a file transfer socket). The name of other sockets might not be
universally known, but given in a transmission over a universally known
socket, (c. g. the socket pair specified by the transmission over the
logger socket under the Initial Connection Protocol (ICP). In any case,
communication over the network is from one socket to another socket, each
socket being identified with a process running at a known host.
RFC 167 the same year https://datatracker.ietf.org/doc/html/rfc167 proposes that socketnames not be required to be unique network-wide but just within a host. It also points out that you really only need the socketname during the initial connection process, if you have some other way of knowing which packets belong to which connections:
> Although fields will be helpful in dealing with socket number
allocation, it is not essential that such field designations be uniform
over the network. In all network transactions the 32-bit socket number
is handled with its 8-bit host number. Thus, if hosts are able to
maintain uniqueness and repeatability internally, socket numbers in the
network as a whole will also be unique and repeatable. If a host fails
to do so, only connections with that offending host are affected.
> Because the size, use, and character of systems on the network are so
varied, it would be difficult if not impossible to come up with an
agreed upon particular division of the 32-bit socket number. Hosts have
different internal restrictions on the number of users, processes per
user, and connections per process they will permit.
> It has been suggested that it may not be necessary to maintain socket
uniqueness. It is contended that there is really no significant use made
of the socket number after a connection has been established. The only
reason a host must now save a socket number for the life of a connection
is to include it in the CLOSE of that connection.
> Initial Connection will be as per the Official Initial Connection Protocol, Documents #2, NIC 7101, to a standard socket not yet assigned. A candidate socket number would be socket #5.
> I would like to collect information on the use of socket numbers
for "standard" service programs. For example Loggers (telnet servers)
Listen on socket 1. What sockets at your host are Listened to by what
programs?
> Recently Dick Watson suggested assigning socket 5 for use by a
mail-box protocol (RFC196). Does any one object ? Are there any
suggestions for a method of assigning sockets to standard programs?
Should a subset of the socket numbers be reserved for use by future
standard protocols?
> Please phone or mail your answers and commtents to (...)
Amusingly in retrospect, Postel did not include an email address, presumably because they didn't have email working yet.
FTP's assignment to port 3 was confirmed in RFC 265 in November:
> Socket 3 is the standard preassigned socket number on which the cooperating file transfer process at the serving host should "listen". (*)The connection establishment will be in accordance with the standard initial connection protocol, (*)establishing a full-duplex connection.
> I propose that there be a czar (me ?) who hands out official socket
numbers for use by standard protocols. This czar should also keep track
of and publish a list of those socket numbers where host specific
services can be obtained. I further suggest that the initial allocation
be as follows:
Sockets Assignment
0-63 Network wide standard functions
64-127 Host specific functions
128-239 Reserved for future use
240-255 Any experimental function
> and within the network wide standard functions the following particular
assignment be made:
So, internet port numbers in their current form are from 01971 (several years before the split between TCP and IP), and DNS is from about 01982.
In December of 01972, Postel published RFC 433 https://www.rfc-editor.org/rfc/rfc433.html, obsoleting the RFC 349 list with a list including chargen and some other interesting services:
Socket Assignment
1 Telnet
3 File Transfer
5 Remote Job Entry
7 Echo
9 Discard
19 Character Generator [e.g. TTYTST]
65 Speech Data Base @ ll-tx-2 (74)
67 Datacomputer @ cca (31)
241 NCP Measurement
243 Survey Measurement
245 LINK
The gap between 9 and 19 is unexplained.
RFC 503 https://www.rfc-editor.org/rfc/rfc503.html from 01973 has a longer list (including systat, datetime, and netstat), but also listing which services were running on which ARPANet hosts, 33 at that time. So RFC 503 contained a list of every server process running on what would later become the internet.
Skipping RFC 604, RFC 739 from 01977 https://www.rfc-editor.org/rfc/rfc739.html is the first one that shows the modern port number assignments (still called "socket numbers") for FTP and Telnet, though those presumably dated back a couple of years at that point:
Specific Assignments:
Decimal Octal Description References
------- ----- ----------- ----------
Network Standard Functions
1 1 Old Telnet [6]
3 3 Old File Transfer [7,8,9]
5 5 Remote Job Entry [10]
7 7 Echo [11]
9 11 Discard [12]
11 13 Who is on or SYSTAT
13 15 Date and Time
15 17 Who is up or NETSTAT
17 21 Short Text Message
19 23 Character generator or TTYTST [13]
21 25 New File Transfer [1,14,15]
23 27 New Telnet [1,16,17]
25 31 Distributed Programming System [18,19]
27 33 NSW User System w/COMPASS FE [20]
29 35 MSG-3 ICP [21]
31 37 MSG-3 Authentication [21]
Etc. This time I have truncated the list. It also has Finger on port 79.
You say, "My understanding is that DNS can potentially provide port numbers, but this is not widely used or supported." DNS SRV records have existed since 01996 (proposed by Troll Tech and Paul Vixie in RFC 2052 https://www.rfc-editor.org/rfc/rfc2052), but they're really only widely used in XMPP, in SIP, and in ZeroConf, which was Apple's attempt to provide the facilities of AppleTalk on top of TCP/IP.
• Full-duplex connections are probably a good idea, but certainly are not the only way, or the most obvious way, to create a reliable stream of data on top of an unreliable datagram layer. TCP itself also supports a half-duplex mode—even if one end sends FIN, the other end can keep transmitting as long as it wants. This was probably also a good idea, but it's certainly not the only obvious choice.
Much of that comes from the original applications being FTP and TELNET.
• Sequence numbers on messages or on bytes?
Bytes, because the whole TCP message might not fit in an IP packet. This is the MTU problem.
• Wouldn't it be useful to expose message boundaries to applications, the way 9P, SCTP, and some SNA protocols do?
Early on, there were some message-oriented, rather than stream-oriented, protocols on top of IP. Most of them died out. RDP was one such.
Another was QNet.[2]
Both still have assigned IP protocol numbers, but I doubt that a RDP packet would get very far across today's internet.
This was a lack. TCP is not a great message-oriented protocol.
• Do you really need urgent data?
The purpose of urgent data is so that when your slow Teletype is typing away, and the recipient wants it to stop, there's a way to break in. See [1], p. 8.
• It turns out that measuring round-trip time is really important for retransmission, and TCP has no way of measuring RTT on retransmitted packets, which can pose real problems for correcting a ridiculously low RTT estimate, which results in excessive retransmission.
Yes, reliable RTT is a problem.
• Do you really need a PUSH bit? C'mon.
It's another legacy thing to make TELNET work on slow links. Is it even supported any more?
• A modest amount of overhead in the form of erasure-coding bits would permit recovery from modest amounts of packet loss without incurring retransmission timeouts, which is especially useful if your TCP-layer protocol requires a modest amount of packet loss for congestion control, as TCP does.
• Also you could use a "congestion experienced" bit instead of packet loss to detect congestion in the usual case. (TCP did eventually acquire CWR and ECE, but not for many years.)
Originally, there was ICMP Source Quench for that, but Berkley didn't put it in BSD, so nobody used it. Nobody was sure when to send it or what to do when it was received.
• The fact that you can't resume a TCP connection from a different IP address, the way you can with a Mosh connection, is a serious flaw that seriously impedes nodes from moving around the network.
That would require a security system to prevent hijacking sessions.
> The fact that you can't resume a TCP connection from a different IP address, the way you can with a Mosh connection, is a serious flaw that seriously impedes nodes from moving around the network
This 100% !! And basically the reason mosh had to be created in the first place (and it probably wasn't easy.) Unfortunately mosh only solves the problem for ssh. Exposing fixed IP addresses to the application layer probably doesn't help either.
So annoying that TCP tends to break whenever you switch wi-fi networks or switch from wi-fi to cellular. (On iPhones at least you have MPTCP, but that requires server-side support.)
• Full-duplex connections are probably a good idea, but certainly are not the only way, or the most obvious way, to create a reliable stream of data on top of an unreliable datagram layer. TCP's predecessor NCP was half-duplex.
• TCP itself also supports a half-duplex mode—even if one end sends FIN, the other end can keep transmitting as long as it wants. This was probably also a good idea, but it's certainly not the only obvious choice.
• Sequence numbers on messages or on bytes?
• Wouldn't it be useful to expose message boundaries to applications, the way 9P, SCTP, and some SNA protocols do?
• If you expose message boundaries to applications, maybe you'd also want to include a message type field? Protocol-level message-type fields have been found to be very useful in Ethernet and IP, and in a sense the port-number field in UDP is also a message-type field.
• Do you really need urgent data?
• Do servers need different port numbers? TCPMUX is a straightforward way of giving your servers port names, like in CHAOSNET, instead of port numbers. It only creates extra overhead at connection-opening time, assuming you have the moral equivalent of file descriptor passing on your OS. The only limitation is that you have to use different client ports for multiple simultaneous connections to the same server host. But in TCP everyone uses different client ports for different connections anyway. TCPMUX itself incurs an extra round-trip time delay for connection establishment, because the requested server name can't be transmitted until the client's ACK packet, but if you incorporated it into TCP, you'd put the server name in the SYN packet. If you eliminate the server port number in every TCP header, you can expand the client port number to 24 or even 32 bits.
• Alternatively, maybe network addresses should be assigned to server processes, as in Appletalk (or IP-based virtual hosting before HTTP/1.1's Host: header, or, for TLS, before SNI became widespread), rather than assigning network addresses to hosts and requiring port numbers or TCPMUX to distinguish multiple servers on the same host?
• Probably SACK was actually a good idea and should have always been the default? SACK gets a lot easier if you ack message numbers instead of byte numbers.
• Why is acknowledgement reneging allowed in TCP? That was a terrible idea.
• It turns out that measuring round-trip time is really important for retransmission, and TCP has no way of measuring RTT on retransmitted packets, which can pose real problems for correcting a ridiculously low RTT estimate, which results in excessive retransmission.
• Do you really need a PUSH bit? C'mon.
• A modest amount of overhead in the form of erasure-coding bits would permit recovery from modest amounts of packet loss without incurring retransmission timeouts, which is especially useful if your TCP-layer protocol requires a modest amount of packet loss for congestion control, as TCP does.
• Also you could use a "congestion experienced" bit instead of packet loss to detect congestion in the usual case. (TCP did eventually acquire CWR and ECE, but not for many years.)
• The fact that you can't resume a TCP connection from a different IP address, the way you can with a Mosh connection, is a serious flaw that seriously impedes nodes from moving around the network.
• TCP's hardcoded timeout of 5 minutes is also a major flaw. Wouldn't it be better if the application could set that to 1 hour, 90 minutes, 12 hours, or a week, to handle intermittent connectivity, such as with communication satellites? Similarly for very-long-latency datagrams, such as those relayed by single LEO satellites. Together this and the previous flaw have resulted in TCP largely being replaced for its original session-management purpose with new ad-hoc protocols such as HTTP magic cookies, protocols which use TCP, if at all, merely as a reliable datagram protocol.
• Initial sequence numbers turn out not to be a very good defense against IP spoofing, because that wasn't their original purpose. Their original purpose was preventing the erroneous reception of leftover TCP segments from a previous incarnation of the connection that have been bouncing around routers ever since; this purpose would be better served by using a different client port number for each new connection. The ISN namespace is far too small for current LFNs anyway, so we had to patch over the hole in TCP with timestamps and PAWS.