[ Up to the Home Page of CS-534 ]
Assume now that the off-chip SRAM chip(s) are pipelined, "zero-bus-turnaround" SRAM's, like the Micron MT55L256L36P, 8 Mbit ZBT SRAM that we saw in class. In that case, dependent off-chip accesses can occur, at the earliest, three cycles after one another: if address A1 is valid on the address bus on clock edge number 1, read data D(A1) will be valid on the data bus on clock edge number 3; assuming that the controller chip is fast enough, it can then drive these data D(A1) back on the address bus during the immediately succeding clock cycle --the clock cycle between edges 3 and 4-- so that, after the drive delay, they are valid and stable on clock edge number 4.
Under this new assumption, recalculate all the latency values for the enqueue and dequeue operation in all the system configurations shown on that 4th transparency. To be precise, use the following definition of latency: the latency of an operation is the time, in clock cycles, between initiations of two successive similar operations, assuming that no operation overlap occurs, i.e. assuming that the first memory access for the second operation occurs after the last memory access for the first operation.
For each operation (enqueue or dequeue) and for each of the system configurations above, show the schedule of memory access, i.e. which memory(ies) are accessed in each clock cycle, at which address. For the ZBT off-chip SRAM's, the cycle during which you show the access corresponds to the cycle during which the controller chip drives the address on the address bus; this address is valid on the address bus on clock edge that terminates this clock cycle.
In this exercise, you will study an alternative scheme: the packets that belong to a same queue are stored contiguously after one another, starting in the next byte of the same memory block after the last byte of their previous packet, whenever that previous packet finishes before the end of a block. Thus, the memory block size may be relatively large, because the only unused space is at the beginning of the first block of each queue and at the end of the last block of each queue --not at the end of the last block of each packet as in the other scheme. (If the number of queues is very large, on the same order as the number of packets in the buffer memory, then this scheme makes little sense).
Assume that no special marker or pointer is needed to indicate the end of each packet (and hence the beginning of the next packet), because each packet includes enough information to indicate its precise length (e.g. a length field, or a special end-of-packet pattern inside its data). This is important, given that the number of packet boundaries inside a single memory block may be potentially large, hence keeping end-of-packet markers or pointers in the queue data structures would be akward. Thus, the only management data needed per memory block is a pointer to the next block on its queue. The head and tail pointers of each queue, on the other hand, need to be full pointers into the buffer memory, pointing to a specific byte position inside a specific block --not merely pointers to (the beginning of) a block.
Analyze the time cost of this scheme: is it better, worse, or similar to the cost of the "usual" scheme? By "time cost" we mean the number of management operations and memory accesses needed when enqueueing an incoming packet into a queue or when dequeueing a departing packet from a queue. Include the free list operations in your analysis. Analyze both the worst case and the average case. Try to come up with equations similar to those in the 6th transparency of section 3.5, and explain your reasoning.