CS-534: Packet Switch Architecture
Spring 2015 |
Department of Computer Science
© copyright: University of Crete, Greece |
[Lectures: 4.2 Input Queueing, 4.4 CIOQ] | [print version, in PDF] |
Using the same technique, calculate the saturation throughput of a 3x3 input queued switch (single queue per input). The head-of-line cells of the three input queues may form 27 combinations. List these combinations; compute the probability of each; compute the average per-output throughput of the switch in each of these cases (and briefly explain); and, finally, compute the average saturation throughput.
You will simulate a single-stage (VOQ) crossbar under randomly-destined traffic, comprising unit-length packets (cells). In BookSim, you can run a simulation specifying a configuration file in the following way.
> ./booksim config.txt injection_rate=0.7In addition, through the command line, you may pass parameter values, which will override the corresponding values in the configuration file -- in the example above, we override the "injection_rate", which is the load at each input.
Here is a sample configuration file.
> cat config.txt topology = fly; // k-ary n-fly = ~ banyan k = 16; // 16 ports per switch n = 1; // one stage router = iq; // switch type input queueing noq = 1; // use VOQs at each input num_vcs = 16; // number of queues per input; set it equal to k for noq = 1 // if buffer_policy = private & buf_size = -1, then vc_buf_size = vc_buf_size // if buffer_policy = shared & buf_size >= 0, then buf_size = buf_size buffer_policy = private; // or shared buf_size = -1; // -1 for buffer_policy = private; vc_buf_size = 1024; // size of each VOQ with buffer_policy = private internal_speedup = 1.0; // crossbar internal speedup -- expects a real number output_buffer_size = -1; // means unlimited output buffer size; used when speedup > 1.0 sw_allocator = islip; // pim alloc_iters = 1; // iterations of scheduling PIM/iSLIP algorithms; warning: is it broken? // traffic pattern packet_size = 1; // each packet consists of 1 flit traffic = uniform; // Bernoulli iid uniformly-destined traffic. injection_rate = 0.7; // misc sample_period = 100000; routing_function = dest_tag; use_read_write = 0; routing_delay = 0; vc_alloc_delay = 1; sw_alloc_delay = 1; st_final_delay = 1; private_buf_size = -1;In this configuration, each VOQ has separate (private) buffer space for 1024 packets.
To generate a load-latency curve for one switch configuration, run multiple simulations for increasing load values between [0.1:0.99]; for each run, collect the "Network latency average" output of the simulator, and create a load/latency (2d) matrix, which you can plot using e.g. gnuplot.
Generate the load-latency curves for PIM and iSLIP, for 16x16, 32x32, and 64x64 crossbars. Compare and comment on the curves you obtained.
Next, considering the 32x32 switch, obtain the load-latency curves for PIM and iSLIP for 1, 2 and 4 scheduling iterations (parameter "alloc_iters"). Are the results improving with more iterations? If not, check whether the simulator really utilized the alloc_iters parameter value that you specified. If it didn't, try to manually change it in the C++ files of PIM and iSLIP, which are located in directory "src/allocators".
Next, consider a 32x32 switch using one iterations of iSLIP (alloc_iters = 1), under full input load ("injection_rate" = 1.0). In this test, assume that the buffer space at each input of the switch is shared by the local (at this input) VOQs. To do so, set "buffer_policy = shared" and play with the shared buffer size (parameter "buf_size"). Plot the "Accepted packet rate average" for increasing buffer size {1,2, 8, 32, 128, 1024}. Why iSLIP cannot achieve full throughput for all buffer sizes? Repeat the experiment using PIM instead of iSLIP. Is PIM worse than iSLIP? Finally, repeat the aforementioned experiments using an internal speedup of 2. Comment on the results. Which one of the following do you think is more cost-effective for a 32x32 switch chip implementation, large input buffers, multiple scheduling iterations or internal speedup?
[2] N. McKeown: "The iSLIP Scheduling Algorithm for Input-Queued Switches", IEEE/ACM Trans. on Networking, vol. 7, no. 2, April 1999, pp. 188-201.
Up to the Home Page of CS-534
|
© copyright
University of Crete, Greece.
Last updated: 4 April 2015 by N. Chrysos; earlier versions by G. Passas, A. Psathakis, and M. Katevenis. |