3,Transport Layer 3b-1
TCP,Overview RFCs,793,1122,1323,2018,2581
? full duplex data:
? bi-directional data flow
in same connection
? MSS,maximum segment
size
? connection-oriented:
? handshaking (exchange
of control msgs) init?s
sender,receiver state
before data exchange
? flow controlled:
? sender will not
overwhelm receiver
? point-to-point:
? one sender,one receiver
? reliable,in-order byte
steam:
? no,message boundaries”
? pipelined:
? TCP congestion and flow
control set window size
? send & receive buffers
s oc k et
door
T CP
s en d b uf f er
T CP
r ec ei v e b uf f er
s oc k et
door
s e g m e n t
ap pl i c ati on
w r i tes da ta
ap pl i c ati on
r ea ds da ta
3,Transport Layer 3b-2
TCP segment structure
source port # dest port #
32 bits
application
data
(variable length)
sequence number
acknowledgement number
rcvr window size
ptr urgent datachecksum
FSRPAUheadlen notused
Options (variable length)
URG,urgent data
(generally not used)
ACK,ACK #
valid
PSH,push data now
(generally not used)
RST,SYN,FIN:
connection estab
(setup,teardown
commands)
# bytes
rcvr willing
to accept
counting
by bytes
of data
(not segments!)
Internet
checksum
(as in UDP)
3,Transport Layer 3b-3
TCP seq,#?s and ACKs
Seq,#?s:
? byte stream
“number” of first
byte in segment?s
data
ACKs:
? seq # of next byte
expected from
other side
? cumulative ACK
Q,how receiver handles
out-of-order segments
? A,TCP spec doesn?t
say,- up to
implementor
Host A Host B
User
types
?C?
host ACKs
receipt
of echoed
?C?
host ACKs
receipt of
?C?,echoes
back ?C?
time
simple telnet scenario
3,Transport Layer 3b-4
TCP,reliable data transfer
simplified sender,assuming
wait
for
event
wait
for
event
event,data received
from application above
event,timer timeout for
segment with seq # y
event,ACK received,
with ACK # y
create,send segment
retransmit segment
ACK processing
?one way data transfer
?no flow,congestion control
3,Transport Layer 3b-5
TCP,
reliable
data
transfer
00 sendbase = initial_sequence number
01 nextseqnum = initial_sequence number
02
03 loop (forever) {
04 switch(event)
05 event,data received from application above
06 create TCP segment with sequence number nextseqnum
07 start timer for segment nextseqnum
08 pass segment to IP
09 nextseqnum = nextseqnum + length(data)
10 event,timer timeout for segment with sequence number y
11 retransmit segment with sequence number y
12 compue new timeout interval for segment y
13 restart timer for sequence number y
14 event,ACK received,with ACK field value of y
15 if (y > sendbase) { /* cumulative ACK of all data up to y */
16 cancel all timers for segments with sequence numbers < y
17 sendbase = y
18 }
19 else { /* a duplicate ACK for already ACKed segment */
20 increment number of duplicate ACKs received for y
21 if (number of duplicate ACKS received for y == 3) {
22 /* TCP fast retransmit */
23 resend segment with sequence number y
24 restart timer for segment y
25 }
26 } /* end of loop forever */
Simplified
TCP
sender
3,Transport Layer 3b-6
TCP ACK generation [RFC 1122,RFC 2581]
Event
in-order segment arrival,
no gaps,
everything else already ACKed
in-order segment arrival,
no gaps,
one delayed ACK pending
out-of-order segment arrival
higher-than-expect seq,#
gap detected
arrival of segment that
partially or completely fills gap
TCP Receiver action
delayed ACK,Wait up to 500ms
for next segment,If no next segment,
send ACK
immediately send single
cumulative ACK
send duplicate ACK,indicating seq,#
of next expected byte
immediate ACK if segment starts
at lower end of gap
3,Transport Layer 3b-7
TCP,retransmission scenarios
Host A
losst
im
eou
t
time lost ACK scenario
Host B
X
Host A
Se
q=
92
tim
eou
t
time premature timeout,
cumulative ACKs
Host B
Se
q=
10
0 t
im
eou
t
3,Transport Layer 3b-8
TCP Flow Control
receiver,explicitly
informs sender of
(dynamically changing)
amount of free buffer
space
? RcvWindow field in
TCP segment
sender,keeps the amount
of transmitted,
unACKed data less than
most recently received
RcvWindow
sender won?t overrun
receiver?s buffers by
transmitting too much,
too fast
flow control
receiver buffering
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
3,Transport Layer 3b-9
TCP Round Trip Time and Timeout
Q,how to set TCP
timeout value?
? longer than RTT
? note,RTT will vary
? too short,premature
timeout
? unnecessary
retransmissions
? too long,slow reaction
to segment loss
Q,how to estimate RTT?
? SampleRTT,measured time from
segment transmission until ACK
receipt
? ignore retransmissions,
cumulatively ACKed segments
? SampleRTT will vary,want
estimated RTT,smoother”
? use several recent
measurements,not just
current SampleRTT
3,Transport Layer 3b-10
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
? Exponential weighted moving average
? influence of given sample decreases exponentially fast
? typical value of x,0.1
Setting the timeout
? EstimtedRTT plus,safety margin”
? large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation +
x*|SampleRTT-EstimatedRTT|
3,Transport Layer 3b-11
TCP Connection Management
Recall,TCP sender,receiver
establish,connection”
before exchanging data
segments
? initialize TCP variables:
? seq,#s
? buffers,flow control
info (e.g,RcvWindow)
? client,connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
? server,contacted by client
Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1,client end system
sends TCP SYN control
segment to server
? specifies initial seq #
Step 2,server end system
receives SYN,replies with
SYNACK control segment
? ACKs received SYN
? allocates buffers
? specifies server->
receiver initial seq,#
3,Transport Layer 3b-12
TCP Connection Management (cont.)
Closing a connection:
client closes socket:
clientSocket.close();
Step 1,client end system
sends TCP FIN control
segment to server
Step 2,server receives
FIN,replies with ACK,
Closes connection,sends
FIN,
client server
close
close
closed
tim
ed
wa
it
3,Transport Layer 3b-13
TCP Connection Management (cont.)
Step 3,client receives FIN,
replies with ACK,
? Enters,timed wait” -
will respond with ACK
to received FINs
Step 4,server,receives
ACK,Connection closed,
Note,with small
modification,can handly
simultaneous FINs.
client server
closing
closing
closed
tim
ed
w
ait
closed
3,Transport Layer 3b-14
TCP Connection Management (cont)
TCP client
lifecycle
TCP server
lifecycle
3,Transport Layer 3b-15
Principles of Congestion Control
Congestion:
? informally:,too many sources sending too much
data too fast for network to handle”
? different from flow control!
? manifestations:
? lost packets (buffer overflow at routers)
? long delays (queueing in router buffers)
? a top-10 problem!
3,Transport Layer 3b-16
Causes/costs of congestion,scenario 1
? two senders,two
receivers
? one router,
infinite buffers
? no retransmission
? large delays
when congested
? maximum
achievable
throughput
3,Transport Layer 3b-17
Causes/costs of congestion,scenario 2
? one router,finite buffers
? sender retransmission of lost packet
3,Transport Layer 3b-18
Causes/costs of congestion,scenario 2
? always,(goodput)
?,perfect” retransmission only when loss:
? retransmission of delayed (not lost) packet makes larger
(than perfect case) for same
lin lout=
lin lout>
lin
lout
,costs” of congestion:
? more work (retrans) for given,goodput”
? unneeded retransmissions,link carries multiple copies of pkt
3,Transport Layer 3b-19
Causes/costs of congestion,scenario 3
? four senders
? multihop paths
? timeout/retransmit
linQ,what happens as
and increase?lin
3,Transport Layer 3b-20
Causes/costs of congestion,scenario 3
Another,cost” of congestion:
? when packet dropped,any,upstream transmission
capacity used for that packet was wasted!
3,Transport Layer 3b-21
Approaches towards congestion control
End-end congestion
control:
? no explicit feedback from
network
? congestion inferred from
end-system observed loss,
delay
? approach taken by TCP
Network-assisted
congestion control:
? routers provide feedback
to end systems
? single bit indicating
congestion (SNA,
DECbit,TCP/IP ECN,
ATM)
? explicit rate sender
should send at
Two broad approaches towards congestion control:
3,Transport Layer 3b-22
Case study,ATM ABR congestion control
ABR,available bit rate:
?,elastic service”
? if sender?s path
“underloaded”,
? sender should use
available bandwidth
? if sender?s path
congested,
? sender throttled to
minimum guaranteed
rate
RM (resource management)
cells:
? sent by sender,interspersed
with data cells
? bits in RM cell set by switches
(“network-assisted”)
? NI bit,no increase in rate
(mild congestion)
? CI bit,congestion
indication
? RM cells returned to sender by
receiver,with bits intact
3,Transport Layer 3b-23
Case study,ATM ABR congestion control
? two-byte ER (explicit rate) field in RM cell
? congested switch may lower ER value in cell
? sender? send rate thus minimum supportable rate on path
? EFCI bit in data cells,set to 1 in congested switch
? if data cell preceding RM cell has EFCI set,sender sets CI
bit in returned RM cell
3,Transport Layer 3b-24
TCP Congestion Control
? end-end control (no network assistance)
? transmission rate limited by congestion window
size,Congwin,over segments:
? w segments,each with MSS bytes sent in one RTT:
throughput = w * MSSRTT Bytes/sec
Congwin
3,Transport Layer 3b-25
TCP congestion control:
? two,phases”
? slow start
? congestion avoidance
? important variables:
? Congwin
? threshold,defines
threshold between two
slow start phase,
congestion control
phase
?, probing” for usable
bandwidth:
? ideally,transmit as fast
as possible (Congwin as
large as possible)
without loss
? increase Congwin until
loss (congestion)
? loss,decrease Congwin,
then begin probing
(increasing) again
3,Transport Layer 3b-26
TCP Slowstart
? exponential increase (per
RTT) in window size (not so
slow!)
? loss event,timeout (Tahoe
TCP) and/or or three
duplicate ACKs (Reno TCP)
initialize,Congwin = 1
for (each segment ACKed)
Congwin++
until (loss event OR
CongWin > threshold)
Slowstart algorithm
Host A
RTT
Host B
time
3,Transport Layer 3b-27
TCP Congestion Avoidance
/* slowstart is over */
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
Congwin = 1
perform slowstart
Congestion avoidance
1
1,TCP Reno skips slowstart (fast
recovery) after three duplicate ACKs
3,Transport Layer 3b-28
TCP Fairness
Fairness goal,if N TCP
sessions share same
bottleneck link,each
should get 1/N of link
capacity
TCP congestion
avoidance:
? AIMD,additive
increase,
multiplicative
decrease
? increase window by 1
per RTT
? decrease window by
factor of 2 on loss
event
AIMD
TCP connection 1
bottleneck
router
capacity R
TCP
connection 2
3,Transport Layer 3b-29
Why is TCP fair?
Two competing sessions:
? Additive increase gives slope of 1,as throughout increases
? multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance,additive increase
loss,decrease window by factor of 2
congestion avoidance,additive increase
loss,decrease window by factor of 2
3,Transport Layer 3b-30
Chapter 3,Summary
? principles behind
transport layer services:
? multiplexing/demultiplexing
? reliable data transfer
? flow control
? congestion control
? instantiation and
implementation in the Internet
? UDP
? TCP
Next:
? leaving the network
“edge” (application
transport layer)
? into the network,core”