The objective of this thesis is to implement e±cient lock-based synchronization
by a novel, high performance, simple and scalable hardware technique that is easily
applicable to a shared-memory multiprocessor System-on-a-Chip (SoC). Our solution
is provided in the form of an intellectual property (IP) hardware unit which we call the
SoC Lock Cache (SoCLC). The SoCLC provides e®ective lock hand-o® by reducing
on-chip memory tra±c and improving performance in terms of lock latency, lock delay
and bandwidth consumption.
In our methodology, lock variables are accessed via SoCLC hardware. The SoCLC
consists of one-bit registers to store lock variables and associated control logic to
e®ectively implement the lock hand-o® via interrupt generation, which eliminates
busy-wait problems. In this way, the SoCLC eliminates the use of the main memory
bus for unnecessary spinning and thus enables the memory bandwidth to be available
for other useful work. On the other hand, unlike the related previous work in the literature, the SoCLC
does not require any special atomic assembly instructions (e.g., compare-and-swap,
test-and-set, load-linked/store-conditional instructions), extended cache protocol(s),
extra cache lines/tags or any other architectural modi¯cations/extensions to the pro-
cessor core. Rather, the SoCLC methodology is a processor/memory/cache-hierarchy
independent solution.
Our experimental results indicate that SoCLC can achieve 37% overall speedup
over traditional locking mechanism in a microbenchmark program with a high con-
tention condition for four processor system. Moreover, with increased memory la-
tency, the speedup of SoCLC for the same microbenchmark is also increased, achiev-
ing up to 107% speedups for a memory latency of 33 clock cycles. We also examine
the false sharing e®ect as well as increased CS length e®ect on locking performance.
Another set of experiments have been conducted with a database application program
for which SoCLC has been shown to achieve speedup of 31% in the overall execution
time.
To automate SoCLC design, we have also developed an SoCLC-generator tool,
PARLAK, that is capable of generating parametrized, synthesizable and user speci¯ed
con¯gurations of a custom SoCLC. Using PARLAK with .25¹ TSMC technology and
a 10ns clock period, we have generated customized SoCLCs from a version for two
processors to a version for four processors occupying up to 37,940 gates of area for
256 lock variables. We have also generated customized SoCLCs for larger number of processors with a 50ns clock period; e.g., an SoCLC version for 14 processors occupied
78,240 gates of area for 256 lock variables.
Furthermore, the SoCLC mechanism has been extended to support priority inher-
itance with an immediate priority ceiling protocol (IPCP) implemented in hardware,
which enhances the hard real-time performance of the system. The experimental re-
sults indicate that the SoCLC can achieve up to 43% overall speedups on practical
applications. Furthermore, it has been shown in a robot application that with the
IPCP mechanism integrated into the SoCLC, all of the tasks could meet their dead-
lines (e.g., a high priority task with 250¹s worst case response time could complete
its execution in 93¹s with SoCLC, however the same task missed its deadline by com-
pleting its execution in 283¹s without SoCLC). Therefore, with IPCP support, our
solution can provide better real-time guarantees for real-time systems. |