ElectronicsNews

The design of the NoC is key to the success of large, high-performance compute SoCs

As chips become ever more complex to handle the increasing numbers of calculation being carried out every second, there is a significant challenge of keeping the vast quantities of data flowing around the chip in a timely manner. Sondrel explains that designers often overlook the vital importance of this data flow aspect as the design of the Network on Chip (NoC), which is responsible for this, is complex and it is hard to verify that the performance requirements are met in all circumstances as there are lots of corner cases. This results in sub-optimal data transfers by the NoC and the SoC failing to deliver.
“The performance of the NoC must match that of the compute part of the SoC,” explained Ben Fletcher, Sondrel’s Director of Engineering. “The NoC’s function is to supply the input data fast enough to keep the compute IPs on the chip running at their maximum capacity and to store the output data so that the system does not become blocked. We use Arteris® FlexNoC® IP as the NoC communications backbone for the SoC because it enables us to tape out ever more complex chips in less time.”
Why FlexNoc?
He identified some of the specific benefits of using FlexNoC interconnect technology as being firstly the ability to reduced area and wire count. This is done by leveraging the transport layer packetisation and serialisation capabilities so that the NoC architect can precisely control which parts of the NoC can benefit from reduced wire and area without compromising on performance requirements. Secondly, reducing the power consumption by using the power management features, such as options to configure clock domain crossings and clock-gating support, to ensure that the power consumption is well within the power budget. Thirdly being able to create a physically-aware design as the design teams are able to hand over a netlist to the backend team that is guaranteed to meet timing because the NoC design methodology considers the SoC floorplan and any physical design constraints right from the start of the design. Lastly, FlexNoc has advanced configuration tooling with excellent UI. The suite of tools provided to generate a performant, timing-clean interconnect is intuitive and incredibly easy for NoC architects to familiarise themselves with, thereby improving productivity.
What does a NoC do?
NoCs interconnect almost every part of an SoC. They are intrinsically linked to the chip’s floorplan, architecture, functional requirements, startup, security, safety and many other aspects. “This means that there can be a high likelihood that the floorplan will change through the life of the project which requires changes to the NoC. These changes then impact the floorplan creating a feedback loop that can cause delays and cost overruns,” warns Ben Fletcher. “Over the years of designing large complex SoC, we have developed a number of techniques that allow us to carry out performance exploration and verification early in the process. By securing the requirements early on and being able to quickly verify NoC spins meet those requirements, we can stabilise the floorplan and the NoC to reduce unnecessary churn in the design to reduce risk and unnecessary costs.”

Example of a NoC on the left in blue and on the right in blue on the floorplan
Initially the NoC on a functional block diagram on the right of the illustration appears to be deceptively simple – just a lot of connections. However, the floor plan on the left shows that it takes up a sizable area with a complicated layout to achieve the high clock speeds to move vast amounts of data across the chip while connecting to the spread-out physical locations of the IP blocks which makes timing difficult to close.
Floorplan or NoC first?
Usually, designers start a chip design with either the floorplan or the NoC first which results in the feedback loop described previously. Sondrel’s approach avoids this by doing Performance Exploration at a very early stage to stabilise the performance requirements and thereby stabilise and test the architecture to reduce the possibility of changes which in turn stabilises the floorplan and the NoC. Performance Exploration solves the problem that IP blocks are typically verified in isolation. However, this does not take into account their interaction with other IP blocks. The more IP blocks on a chip, the harder it is to appreciate all the dependences between them that can seriously affect the performance a chip can achieve. For example, master/slave interfaces may not be matching, conflicts over shared memory, clocking differences, etc. Further details can be found in Sondrel’s white paper on ‘The 10 practical steps to model and design a complex SoC’ at www.sondrel.com/solutions/white-papers
Having done the Performance Exploration to obtain the performance requirements, there is now enough information to start configuring the NoC. What is needed is a means of testing the RTL generated against those requirements to determine how well it meets them and then to quickly iterate to get to the performance point required. For this, Sondrel has developed a proprietary testbench called Performance Verification Environment. This uses synthesisable RTL not an approximate model of it and the processors and subsystems are replaced with transactors defined in Python code. This results in memory-mapped bus traffic being generated in Python and driven through the NoC to quickly see what is going on in the design and how changes improve the data traffic flow. Further details can be found in Sondrel’s white paper on ‘Comparing Performance Verification Environment and RTL’ at www.sondrel.com/solutions/white-papers
These fast iterations enable the NoC configuration to be rapidly explored to find the appropriate solution which is then tried with the floor plan so that both are then optimised in tandem. This enables a stable state to be achieved much faster which de-risks the project.
As the specifications for the chip can change due to changes in the market need, this whole modelling process is easily updated without having to start again from square one to see how well the modified chip design will meet the new requirements with evidenced-based data.