#chetanpatil – Chetan Arvind Patil

Blog

The Rise Of Semiconductor Powered XPU
Photo by Laura Ockel on Unsplash

THE NEED FOR SEMICONDUCTOR POWERED XPU

Every software application eventually has to get executed on a hardware system. Whether the software application is running on a smartphone or a data center, the data processing request has to get decoded (binary instructions) before the hardware system can process the request successfully. This seamless exchange of processes between software and hardware forms the base for a computer system.

Software form factor, user interface, and the speed might have changed over the years. However, the need to have a processing unit that can execute all the software code has not. Over the last two decades, the de-facto processing unit – the Central Processing Unit (CPU) – has seen several semiconductor and computer architecture backed technological advancement, that has taken computing to the next level.

As the software layers (application, presentation, session, and transport) are becoming model-driven (more pro-active, than re-active), so is the need to process the unique data/compute/memory intensive requests at the hardware-level.

Traditional CPU earlier was designed to handle a single request at a time. Then, the computing world moved towards multi-CPU (multi-cores) to cater to the parallel computing demand. Today, the de-facto is a System-On-A-Chip (SoC) that packs the CPU, the Graphics Processing Unit (GPU), and other processing units to process the high-resolution, high-speed, highly-data intensively requests in the shortest possible time.

Semiconductor Powered XPU (X Processing Unit) Are More Application-Specific Than General-Purpose

The integrated system (mainly CPU + GPU on an SoC) has not been able to keep up with the computing world’s data processing demand. It has pushed the computer architects to design new types of processing units (apart from high-bandwidth memory, cache coherence, and smarter interconnect topology) that are more application-specific than general-purpose.

This race to come up with the new type of processing units has given rise to XPU:
- X = Application Domain — Vision, Graphic, Neural, X-Reality, Video, Audio, and so on
- P = Processing
- U = Unit
XPU is different than CPU and GPU as it caters to the specific needs of the application. XPU is more application-specific, and it can work standalone or as a co-processor/co-unit alongside the CPU and GPU. XPU is geared towards throughput and speedy data management that takes the best out of the CPU and GPU design methodology to enable application-specific needs. XPU is not only Application-Specific Integrated Circuit (ASIC) due to the workload it is designed to cater to, but can also be classified as Application-Specific Standard Product (ASSP).

The rise of XPU is enabling a new era in computing. The hardware and the semiconductor market are enjoying different challenges and solutions the XPU brings. Intel is betting big on it, and so is AMD. Apart from these two semiconductor giants, there are numerous innovative startups (and academic research) that have XPU powered solutions and are pushing the computing industry towards next-gen data processing.

Several types of XPU are available. It is vital to understand how these differ from each other apart from the two processing giants – the CPU and the GPU.

Picture By Chetan Arvind Patil

THE XPU CATALOG

There are numerous XPU powered examples in the market. Many of these are still in a nascent stage and yet to prove the solution in the market. Given the growth and demand for new AI workloads, the XPU catalog will keep growing.

Below are the major semiconductor powered XPU:

AIPU – AI Processing Unit — AIPU is targeted towards an Artificial Intelligence solution and is mainly designed to cater to the Edge AI market. MediaTek Helio series of SoC is an example of AIPU integrated with CPU and GPU. Even RAIP – Real AI Processing Unit (RAIPU), IPU – Intelligence Processing Unit or Image Processing Unit, EPU – Emotion Processing Unit – fall under IPU with the only difference being the change of name. The goal of AIPU, RAIPU, EPU, and IPU is the same – to process data to make a decision that is at-level or at-part with human intelligence.
APU – Accelerated Processing Unit — APU design requires fusing the CPU and GPU into a single die. AMD’s A-Series processor is a perfect example of an APU. APU is also capable of running a heterogeneous system by utilizing system-level architecture and software features.
AMPU – Analog Matrix Processing Unit — AMPU is designed to cater to the data training needs that are often large parameters that execute multiple matrix multiplications. AMPU custom feature to handle such parameters and matrix operations allows speedy training without relying on the external memory. MYTHIC’s Analog Matrix Processor is an example of AMPU.
BPU – Brain Processing Unit — BPU is envisioned to minim the human brain as it is. The processing unit is capable of performing multiple TOPS. Horizon’s Journey line of products is considered the first BPU ever and is designed in close collaboration with the Baidu Institute of Deep Learning. BPU may also form the base for Level 5 automation.
CPU – Central Processing Unit — CPU is the de-facto processing unit and is vital for general to specific purpose computing needs. Intel and AMD are leading the CPU innovation along with ARM. CPU is here to stay and will keep providing the much-needed multi-tasking capability for consumer applications. On top, Academia is still engaging in research to make cache and data pipeline more energy efficient.
DPU – Dataflow Processing Unit | Data Processing Unit | Data Parallel Unit — DPU focuses on speeding up the data movement between the cores and the memory. It requires a new interconnect topology apart from the smart placement of sub-blocks to minimize bottlenecks. The instruction set allows for faster memory and compute-intensive processing. Intel already several RISC (AVX512) instructions set to optimize processing for x64. DPU takes it to the next level with its highly optimized data pipeline, which enables massive parallelism. Fungible’s DPU is one such example apart from Deep Vision’s ARA series. RDPU – Reconfigurable Dataflow Processing Unit – is also a type of DPU.
DLPU – Deep Learning Processing Unit — A DLPU finds similarity with DPU. It is a domain-specific solution to enable faster training. DLPU finds use in Edge AI and similar applications. Researchers first showcase DLPU with DianNao paper, and recently in Cambricon-F paper.
GPU – Graphics Processing Unit – Like CPU, GPU has been in the market for a long time and is designed to cater to graphics applications. In the last few years, GPU has also found its way into AI/ML/DL applications too. Highly parallel design (with multiple core and large memory) of GPUs enables faster computation. GPU’s efficient programmability also allows faster training of dataset. NVIDIA by far is the leader in the GPU domain along with AMD, which is not far behind. Depending on how the GPU is fabricated in the computer system, it also gets classified as = DGPU – Discrete Graphic Processing Unit, GPGPU – General Purpose Graphic Processing Unit, EGPU – External Graphic Processing Unit, IGPU – Integrated Graphic Processing Unit.
HPU – Holographic Processing Unit — Coming out of Microsoft, HPU is designed for X-Reality. It incorporates design to process rich information that is generated by the sensors and cameras on the X-Reality device. HPU incorporates processing units to implement DNNs.
MCPU – Micro Controller Processing Unit — MCPU is not used to run operating systems or frameworks, but instead are geared to run Real-Time Operating System (RTOS) powered solution. MCPU find use in automotive, remote devices and even laptops desktop to offload non-critical tasks. ARM, Texas Instruments, and others have been providing MCPU solutions for a few decades. In the AI-powered world, the solution around MCPU is increasing, and architecture design is adapting to it.
NPU – Neural Processing Unit — NPU implements all the required blocks to enable faster data modeling using neural networks. Alibaba’s Ali-NPU is one such example. There are different types of NPU: NDPU – Neural Decision Processing Unit, NNPU – Neural Network Processing Unit, NDPU – Neural Decision Processing Unit. Eventually, the end-goal of the types of NPUs is the same – train the data faster using a neural network and framework.
PPU – Physics Processing Unit — Mostly used in an academic environment, PPU is designed using FPGA to provide an architecture that can enable faster simulations. SPARTA – SPARTA: Simulation of Physics on a Real-Time Architecture- the project was the first to design a PPU. Then, Ageia (later acquired by NVIDIA) also attempted to provide PPU solutions by providing architecture benefits at the arithmetic logic unit (ALU) level. Intel Xeon Phi, PlayStation 2’s VU0, and even GPUs are a type of PPU.
QPU – Quantum Processing Unit — QPU is a computational unit that uses quantum computing principles to perform a task. The physics used in QPU is drastically different than the general-purpose CPU. D-Wave Systems is the leader in QPU, and their QPU follows superconducting computing. Google, IBM, and Intel have QPU based on a quantum circuit. QPU is massive and not designed for mass-market and are supposed to compete against AIPU.
SPU – Streaming Processing Unit — SPU is useful to process streamed data. The structured data enables placing the cores and memory to minimize the delay in bringing the new data to process. Google’s TSPU – Tensor Streaming Processing Unit – is an example of SPU. There are different types of SPU available in the market apart from TSPU: GSPU – Graph Streaming Processing Unit, TMPU – Texture Mapping Processing Unit, TPU – Tensor Processing Unit
VPU – Vision/Visual/Video Processing Unit — VPU is coming out in the market due to the demand for providing a rich visual experience. VPU consists of more video encoding and decoding units to enabled faster 3D processing. X-Reality is an application area where VPU finds use. Intel’s Movidius VPU is a perfect example of how to use processing units to process video applications with low-latency.
WSPU – Wafer Scale Processing Unit — WSPU makes use of a full wafer to create single processing units instead of fabricated dies to develop processing units. Trilogy Systems was one of the first companies to provide such a solution. Recently, Cerebras has taken the lead and provide a workable deep learning solution using wafer-scale integration that shows how full wafer can be elegantly used to provide

With the growing need to process data faster and efficiently, the demand for semiconductor-powered unique XPU will keep growing.

Picture By Chetan Arvind Patil

THE CHALLENGES FOR SEMICONDUCTOR POWERED XPU

XPU is unique and solves niche problems. The challenges that come with it are many. Below are the hurdles and challenges that the semiconductor powered XPU face:

Cost: Designing a new processing unit not only enables new features to run the workload in the most optimized way possible. But it also adds the cost of design, development, and manufacturing. Balancing the CapEx is always the de-facto goal of any organization, and in the case of XPU, the stakes are higher given the stiff market competition. Companies looking to come out with more XPU based designs and solutions will have to make the process cost-effective to breakeven faster.

Features: XPU is feature-specific. Deciding which problem the XPU is going to solve is difficult to find. The semiconductor industry has launched different types of XPU that cater to almost all the possible computing domains and problems possible. Upcoming XPU will have to beat the existing XPU not only from the design point of view but also with respect to the features that make the new XPU sustainable in terms of power, performance, area, and cost.

Application: Defining use cases and features for XPU is another challenge. It requires figuring out the bottleneck in the existing applications/workloads and then designing the solution at the silicon level to solve it. TSU is one such example, which removes the bottleneck to train the data. Such a unique feature-based XPU is going to make it stand out in the market.

XPU Will Drive Innovation Along With General-Purpose Processing Units

Manufacturing: XPU either needs to be fabricated along with the CPU or as another block with the CPU inside the SoC or as a separate co-processing unit. In any of the three cases, it takes a unique semiconductor manufacturing process to ensure there is no process, quality, package, and reliability issues. Multi-Die Multi-Chip powered chiplets can be one way to ensure future XPU designs for manufacturability. The technology node and the packaging technology needs to be thoroughly tested before making the XPU with it. The goal of reducing the cost of manufacturing for XPU will be another challenge.

Programmability: Hardware is designed to run the software. Developing and running applications on any piece of silicon (mainly those designed to run operating systems and frameworks) requires the support of system-level libraries to ensure the data being processed can make use of all the internal features. Semiconductor companies developing different types of XPUs will have to provide APIs and system-level hardware interface libraries. This requires time and cost to develop. To keep developers engaged dedicated continuous API development teams are required. Providing such software features is vital and also a challenge to the XPU design team.

Research And Development: Continuous R&D is a vital factor that leads to a new type of XPU. It is critical to ensure that the R&D team can collaborate with academia to innovate new processing blocks. Organizing a dedicated top-notch R&D team is still a challenge given the competition in the semiconductor industry. Investment is another differentiating criteria when it comes to advancing new XPU designs.

It is an exciting time to be in the XPU domain. General-purpose CPU and GPU are here to stay, and along with XPU will enable a new, powerful and efficient way to solve the data problems.

However, the need to innovate in the competitive semiconductor industry will be the race to watch out. Companies like Intel, Apple, AMD, Microsoft, Google, Amazon, and Facebook will play a crucial role in pushing the market for semiconductor powered XPU solutions, alongside promising FAB-LESS startups worldwide.
December 27, 2020
The Semiconductor Industry Is Driving The Automotive Industry

Photo by Vlad Tchompalov on Unsplash

THE NEED FOR SEMICONDUCTORS IN AUTOMOTIVE

The automotive industry is going through a significant transformation. Countries all over the world are pressing for greener and eco-friendly vehicles. Government policies are getting stricter and demand vehicular technologies that require smarter software and hardware.

The de-facto transformation is towards electric vehicles. However, other vehicular technologies (hybrid, autonomous, and alternate-fuel) are also driving the change in the automotive industry, and all this is pushing automotive companies to innovate. To keep up with the smart and safe automotive features, the share of software and hardware in automotive is increasing tremendously, thus leading to new automotive technology companies are also emerging. The emerging automotive companies work in collaboration with the established automotive companies or are launching their own smarter electric, hybrid, autonomous, and alternate-fuel automotive products.

One of the pieces to make electric, hybrid, autonomous and alternate-fuel vehicles are semiconductor products, as it forms the base for the hardware required to run the software. In 2021, automotive production may be impacted due to the shortage of semiconductor chips, and it shows the dependency the automotive industry has on the semiconductor industry.

The Semiconductor Industry And Products Are Key To Developing Smarter Hardware To Drive The Automotive Industry Using Smarter Software

The need for semiconductor solutions in automotive is also pushing the innovation and development of smarter automotive chips to drive vehicles safely. The increasing need to develop error-prone software that runs on efficient hardware is vital and is driving the automotive industry to re-invent automotive solutions from a semiconductor point of view.

It is not that semiconductor products started getting used by the automotive industry in 2020. The automotive industry has relied on semiconductor products for decades. From airbags to infotainment and many other solutions have always required defect-free semiconductor products. However, the landscape of semiconductors in automotive is changing from individual silicon components to more centralized silicon systems. The semiconductor products work excellent for solo operations.

However, when the goal is to connect the system for level 5 autonomy or increasing hybrid efficiency, a centralized automotive-specific system-on-a-chip (SoC) is required. The centralized automotive semiconductor-based solution still relies on individual semiconductor components and is leading to innovative work from both the established automotive firms and the new semiconductor FAB-LESS companies looking to eat into the automotive and semiconductor market.

A centralized silicon system needs to cater to the following uses cases for autonomous automotive solution:

Capture Information: Ability to capture information with the help of monitoring sensors, LIDAR, and RADAR
Process Data: Information captured should is processed without delay
Take Decision: The processed data is used to take accurate actions
Autonomy: Decide on behalf of the driver to enable a safer experience
Management: Track electrical and mechanical activities to provide vehicle health
Infotainment: Display, audio, and video system for entertainment
Connectivity: On-the-go navigation, Bluetooth, 5G, and WiFi connectivity
Safety: Ensure critical components and features are working and alert when maintenance is required

The above use cases of semiconductors are valid for any kind of two/three/four automotive solution, including passenger, commercial, motorcycles, and industrial automobiles.

Developing semiconductor solutions for automotive requires strong collaboration with the semiconductor design and manufacturing industry. In some cases, the semiconductor solutions are also being developed in-house by automotive companies. It is why the automakers are investing or building R&D facilities to come up with a high precision silicon need for future vehicles.

Picture By Chetan Arvind Patil

THE DEVELOPMENT OF SEMICODUCTORS IN AUTOMOTIVE

It is critical to make use of a high-level intelligence system to make automotive products smarter. The growing share of technology-powered features in modern vehicles is prompting the automotive industry to invest in software and hardware capabilities in-house. In many cases, automotive companies are also collaborating and investing by out-sourcing many of the vital semiconductor solutions from silicon chips to sensors systems.

Automotive giants and the silicon chip:

BMW: To remove its dependency on outsourced power semiconductor products, BMW has invested in GaN semiconductor startup. In parallel, BMW has also collaborated with Intel to drive its fully autonomous vehicle project.
BOSCH: Like DENSO, BOSCH has been the provider and one of the leaders of automotive components. Given the growth of alternate fuel technology and the dependency on semiconductor products, BOSCH has upped its ante and in-course grabbed billions of vehicle computer orders. BOSCH is also one of the rare automotive companies to also own semiconductor FABs, and this puts BOSCH in a unique position to not only design in-house but also manufacture. Recently, BOSCH also launched a new chip that promises to be a game-changer navigation technology.
DENSO: While DENSO is party owned by Toyota, it still does a lot of work on its own to power future automotive technologies. In line with its ambitions to power EVs, DENSO has invested in startups focusing on semiconductor solutions to manage power and performance. On other hand, DENSO has also taken equity in Infineon Technologies to enhance its semiconductor portfolio. It has also formed a new semiconductor called MIRISE Technologies in-collaboration with Toyota, to develop next-generation in-vehicle semiconductors.
Daimler: To compete with Tesla and other automotive companies, Daimler has partnered with Nvidia to speed up its use of next-generation silicon chips. Apart from this, Daimler has also increased investment in battery manufacturing. An interesting fact – In 1997 Daimler-Benz AG sold its semiconductor business unit TEMIC Telefunken Microelectronic GmbH to Vishay Intertechnology and now it is going back to the same business area to survive the growing semiconductor in the automotive products.
Ford: To cater to its elective vehicle and autonomous technologies, Ford is working with Mobileye and Intel to drive its semiconductor needs.
Hyundai: Hyundai has a long-term plan to remove the dependency on automotive products. In line with its vision, Hyundai has launched a semiconductor lab to develop semiconductor products for electric/hybrid vehicle powertrain controllers.
Honda: Honda a decade ago invested in Shindengen Electric Manufacturing. Given how big Honda is in the two-wheeler segment and the growing electric vehicles market, Honday can reap the benefits of the strategic investment made in Shindengen Electric.
Nissan: Nissan has collaborated with Renesas for its innovative semiconductor needs.
NXP: NXP Semiconductor is one of the leaders in automotive solutions. Apart from a strong portfolio and design capabilities, NXP has an advantage in its manufacturing expertise in automotive products. The growing line of products for automotive will certainly make NXP stand out in the market.
Toyota: Since 2014, Toyota has been using innovative semiconductor technologies to enable higher-fuel efficiency. Last year, Toyota formed JV with DENSO to focus on in-house silicon chip development due to the exploding cost of semiconductors in automotive.
Tata Motors: Tata is one of the leaders in passenger and commercial vehicles in India. Its semiconductor arm Tata ELXSI is capable of providing all the required semiconductor solutions for its need apart from its decade-long collaboration with other automotive semiconductor product providers.
Volkswagen: A few years ago, Volkswagen formed a deep partnership with Infineon Technologies to drive its TRANSFORM 2025+ strategy in line with semiconductors in automotive.

There are numerous examples of big automotive giants already owning or investing in semiconductor chip development. Given the decreasing cost of developing an automotive solution using advanced intelligence techniques, several startups and newcomers are also shaking the automotive market by providing semiconductor solutions.

New comers in automotive silicon chip:

Autotalks: A FAB-LESS startup providing semiconductor solutions for vehicle-to-everything (V2X) communications for the automotive industry.
Apple: Not official yet, but there are numerous reports of Apple’s electric car. If that happens, it is for sure going to be an in-house development.
Argo AI: Developing its software to hardware self-driving technology, Argo AI is another startup getting into automotive semiconductors. It also got recently merged with Audi’s AI center.
GEO Semiconductor: A FAB-LESS semiconductor startup that is providing integrated circuits (ICs) for video and geometry processing.
indie semiconductor: indie Semiconductor makes SoCs for the automotive industry. It has changed its portfolio in the last few years but has certainly got a front foot in the connected car and infotainment business.
NIO: NIO is a China-based EV maker that has decided to go big for in-house silicon chip development to fulfill its need for semiconductors.
NIKOLA: NIKOLA is another promising automotive company. It has been focusing on designing and manufacturing electrical components for greener vehicles. So far, NIKOLA has focused on semi and soon plans to venture into the passenger segment.
SLD Laser: Started in 2013, SLD Laser is pushing the development of laser-based sensor solutions for the automotive industry.
Silicon Mobility: Founded in 2015, Silicon Mobility is providing a semiconductor-based mobility solution to make EV more efficient.
Tesla: While Tesla is not new to the automotive market, its silicon solution is certainly is. It aims to provide a silicon sandbox that is going to make any vehicle an autonomous one. Tesla AI chip is still under in-house use, but will certainly open up the market if sold separately.
Waymo: Owned by Google, Waymo, for now, is using Intel’s technology but in-house is also planning to develop its silicon chip.
Zoox: Amazon bought Zoox early this year and also unveiled its self-driving car. With Amazon’s experience in developing hardware via Amazon Lab126, Zoox automotive hardware solution will be innovative.

The list of newcomers and startups in automotive semiconductors is going to increase. It will be interesting to see how the semiconductors market adapts to the solutions from the smaller companies.

It is for sure going to increase the importance of semiconductor design and development. However, there are still major challenges to overcome before automotive companies can make it big in the semiconductor industry.

Picture By Chetan Arvind Patil

THE CHALLENGES FOR SEMICONDUCTORS IN AUTOMOTIVE

Automotive companies which have been in the industry can establish the semiconductor business unit to meet its need. The challenges arise for newcomers wanting to cater to the automotive industry.

Five major challenges might hinder the progress of newcomers from providing elegant automotive semiconductor solutions:

Cost: Even though the share of semiconductor cost in new vehicles is on the rise, the challenge remains on optimizing the development and manufacturing cost to reach the breakeven point. Optimizing the process to make a low-cost product for large scale consumption is still a challenge. On top of it, proving the smarter semiconductor driven solution to the market is getting tougher. Autonomous (not specifically self-driving) technologies take years to test on the road before they can be used by the mass market. All this adds to the cost.

Talent: Acquiring relevant software and hardware talent to bring silicon solutions to bring innovation to the market is another challenge. Automotive companies getting into self-driving, electric vehicles, and alternate-fuel solutions are doing all they can to form the best team. In some cases, the practice is not as per everyone’s liking. Bringing new talents on-board and training them takes years too. Challenges also remain with universities to launch programs that cater to the new-age automotive and semiconductor market that requires different skills and demands. An interdisciplinary study that combines mechanical, computer, electrical, and semiconductor engineering is the need of the hour.

Policies: The lives of many on the road are at stake, and Government policies play a crucial role in the automotive industry. Not all countries or states enable the framework to test autonomous solutions out on the road. Dedicated infrastructure is required to test solutions like lane guidance and autonomous driving. It is difficult for states to grant permission to test the solution on public roads due to dangerous un-known consequences. Only a few examples of how state policy can strike the balance of technological progress and road safety. Arizona is one such example. The policy has to lead Chalder in Arizona to become the hub for self-driving design and development. Waymo to Uber to Cruise are all testing their solutions out on the road. It shows the growing need to make policies that strike the balance of safety and future market needs.

The Semiconductor Automotive Product Development Demands Talent Pool Trained With Interdisciplinary Curriculum Covering Mechanical, Computer, Electrical And Semiconductor Engineering.

Reliability: Semiconductor automotive products have to go through stringent qualification, testing, and reliability criteria. Automotive Electronics Council (AEC) and a few other standards provide the guidelines for qualifying the semiconductor products. The guidelines require the temperature to reliability stress testing. All these require resources, time, cost, and talent to execute. A chip not functioning during a crash can lead to fatality. Making automotive semiconductor product reliable is a key concern.

Manufacturing: Automotive semiconductor products eventually have to get manufactured in the same arena where smartphone semiconductor products get manufactured. A semiconductor FAB and OSAT have to ensure strict control over the process to ensure zero variation between lots of the same product. Controlling such requirement demands strong industry flow that enables defect-free products. FAB-LESS companies developing semiconductor products for the automotive industry have to engage and invest with semiconductor manufacturing teams.

The challenges are many, but so are the opportunities. Companies have to increase focus on semiconductor products in the automotive industry at the chip level. The automotive industry has leverage semiconductor solutions for decades.

The next decade is going to be a game-changer. The share of semiconductor products and costs in automotive is only going to quadruple. All this will open tremendous opportunities both for the semiconductor and the automotive industry.

December 20, 2020
The Heterogeneous Integration Is Pushing The Semiconductor Industry

Photo by Mika Baumeister on Unsplash

THE GROWING NEED FOR HETEROGENEITY IN SEMICONDUCTOR

The computing world today is all about processing data in real-time. Developers expect their code to compile in milliseconds. Consumers expect applications to respond with zero-delay. All of this requires a seamless communication of different computing components is a must, mainly the software (code) and the hardware (chip) is required.

The computing world pitched against the human brain. The ultimate goal is to outperform the human brain’s ability to sense, think, and act. While computers are outpacing humans, the desire for the silicon brain is still ongoing. To eventually mimic the human brain’s capabilities (mainly – sense, think, and act) demands much more computational speed and optimization than available today.

To reduce the time to run the compiled code on the chip has pushed both the software and the hardware (semiconductor) industry.

The software industry has been consistently coming up with unique ways to handle the data to avoid thrashing. Efficient use of parallel programming to split the single process/task into threads has been one major factor. Software developers (mainly frameworks and programming ones) have also been pro-actively utilizing all the hardware features (SVMS – Scalar, Vector, Matrix, Spatial) to enable a rich user experience by processing the data faster.

To Reduce Time To Completion While Balancing Performance-Per-Watt Is Pushing The Need For Heterogeneous System Architecture

On another side, the hardware (semiconductor) industry is also innovating consistently (and trying to keep up with the software industry’s demand) to provide more performance-per-watt (PPW) that ensures the complex applications/workloads run efficiently. CPU/GPU/FPGA/ASIC design has seen not only architecture level innovations but also transistor-level. Shrinking transistor size made it possible to fabricate System-On-A-Chip (SoC) with billions/trillions of transistors in it. Transistor-level innovation has also allowed AI workloads to thrive. However, the SoC is hitting the design-wall and demands an innovative approach to cater to future workloads.

The SoC hitting the design-wall has pushed the semiconductor industry towards the Heterogeneous System Architecture (a.k.a. heterogeneous integration), which combines the best of the hardware capabilities to form a unique computing system. It allows the workloads to reduce time to completion while balancing the power-to-performance ratio.

Heterogeneous System Architecture requires unique semiconductor techniques that enable processing unit designs built using the best innovation out of the CPU, GPU, FPGA, and ASIC designs. It also drives the manufacturing process towards advanced technology nodes, packaging technology, and novel equipment.

Picture By Chetan Arvind Patil

THE SEMICONDUCTOR INNOVATION TO ENABLE HETEROGENEITY

Academia and the industry has been putting forward ways to design and manufacture architectures that can fit the demand for heterogeneous system architecture.

The heterogeneous system architecture can be classified into three categories:

Synchronous: Synchronous heterogeneous system architecture uses a single voltage, frequency, clock, and power domain for all of its processing units/cores. Multiple clusters with cores can exist, with each cluster designed using a unique data pipeline technique such that clusters are capable of operating at different speeds/frequencies. However, the processing units/cores within a cluster, always run under the same voltage to the power scheme. Apart from the CPU, the GPU (running on separate voltage to power domain) is the only other type of processing unit that is part of the synchronous heterogeneous architecture system. ARM big.LITTLE is one such example.
Asynchronous: Asynchronous heterogeneous system architecture borrows everything from the synchronous one, but it may also allow processing units/cores level voltage, frequency, clock, and power scaling. The helps in fusing cores/units on the same SoC that improves PPW. Qualcomm’s Snapdragon is one such example. However, the data pipeline of all the cores in the Snapdragon is the same. In reality, an asynchronous heterogeneous system architecture is not yet available as it demands innovative transistor-level techniques to drive per core level power domain apart from different core designs. There are thermal challenges too. Asynchronous heterogeneous system architecture often has another type of processing units/cores (FPGA, ASIC, GPU) apart from the CPU.
Fusion: In many architectures that form heterogeneous system architecture, per-core/unit level power management is not available. Instead, a fusion heterogeneous system architecture technique of combining different types (CPU, GPU, FPGA, ASIC, NPU, XPU, and so on) of processing units/cores are used. Each of these processing units may have separate power management. AMD’s Accelerated Processing Unit is an example of fusion without per processing unit/core power management. Fusion-based heterogeneous system architecture demand advanced technology nodes, packaging technology, and novel equipment

Whether one is designing synchronous or asynchronous or fusion heterogeneous system architecture, below are the five pillars of heterogeneous system architecture:

Technology Node: Integrating different processing cores/units to create a heterogeneous system architecture requires advanced technology nodes. The dies or the cores/units that get fused often have to get fabricated with the smallest possible transistor size. A true heterogeneous system architecture is supposed to make use of different technology nodes for the same integrated system. Example: A CPU inside a heterogeneous system architecture maybe 5nm, and the GPU might note be 5nm, and this pushes fabrication semiconductor companies to keep innovating on the transistor size and also on the device/transistor design/type (Planner FET, FinFET, GAAFET, MBCFET, etc.) Investing in Process Design Kit (PDK) and Electronic Design Automation (EDA) tools that can aid defect-free design is also required apart from developing fabrication facilities that can turn designs into silicon chips. All this puts pressure on the fabrication part of the semiconductor manufacturing process.
Packaging Technology: Heterogeneous architecture systems also require new ways to package dies/cores/units in a single platform. Multi-Die Multi-Chip Module based on chiplets based System-In-A-Package (SiP) is becoming a de-facto packaging standard for heterogeneous integration. Intel also has come up with many new solutions around heterogeneous integration. DARPA also has been pushing for Common Heterogeneous Integration and IP Reuse Strategies (CHIP) by collaborating with academia and the industry. To keep up with heterogeneous integration demand, the package technology roadmap has to be continuously revisited and aligned with the fabrication process.
Interconnect: Faster data movement is key to enabling optimization on heterogeneous system architecture. Whether it is within the processing core/unit or in-between two or more, high-speed bandwidth is vital. Many are proposing silicon photonics-based interconnection that provides a high-speed interface. There are still open questions about the power requirements for silicon photonics-based solutions. It might be possible to make use of a photonics-based solution with electrical interconnects. Researchers have also proposed several solutions for heterogeneous silicon photonic interconnects. Eventually, continuous research and development is a must, as tomorrow’s heterogeneous system architecture will be highly complex than today’s.
Memory: Like the high-speed interconnect, high-bandwidth memory is also vital for heterogeneous integration to enable faster read/write for processing core-to-core (unit-to-unit). It may also act as a memory side cache. Intel’s MCDRAM is one such example. AMD also has hUMA that provides Heterogeneous Uniform Memory Access (hUMA) for fusion-based heterogeneous system architecture. Recently, Micron launched 176-Layer NAND, which delivers high performance and density. Similar techniques are required to enable faster input/output in heterogeneous system architecture.
Software: Efficiently scheduling tasks on a heterogeneous platform needs APIs, as it allows developers to map software on the target heterogeneous system architecture efficiently that allows access to internal functional units and drivers. Intel’s oneAPI provides exactly such an interface for its heterogeneous platform. Another approach is utilizing the Heterogeneous System Architecture Intermediate Language (HSAIL) which acts as an ISA for parallel compute routines. Software developers need to also make use of all the internal hardware features to drive the fastest time to completion.

Irrespective of the type of heterogeneous system architecture used – synchronous, asynchronous, or fusion – the above five components are crucial to take full advantage of the heterogeneous system architecture and its capabilities.

Picture By Chetan Arvind Patil

THE IMPACT OF HETEROGENEOUS SYSTEM ARCHITECTURE AND INTEGRATION

The impact of deploying heterogeneous system architecture is largely on the semiconductor manufacturing process due to the highly complex nature of fabrication, testing, and assembling the different types of sub-units using heterogeneous integration technology.

FAB: Semiconductor FABs have to always keep innovating new types of transistor devices, interconnects (TSV Interposers, etc.), and most importantly the advanced technology nodes. The demand and pressure to produce zero-defect products while the transistor size decreases is a challenge in itself. FABs like TSMC and Samsung have already started work on 3nm well before large scale production of 5nm, which is a challenging task. Lower technology nodes will enable highly complex silicon that is most likely be part of the heterogeneous system architecture. Fabricating 7/5/3 nm and beyond not only requires massive investment (upward of $10+ Billion) but demands continuous research and development too in close collaboration with academia. Heterogeneous integration is a vital market and is pushing the FABs to get into the packaging domain. TSMC already has a TSMC-SoIC solution for heterogeneous chiplets integration. Soon Samsung and others follow the suit.

OSAT: OSATs are preparing by upgrading their packaging solution to align with the heterogeneous system architecture need. Fan-Out and SiP alike advanced packaging techniques require near-perfect materials and assembly recipes. ASE Global already has a roadmap to cater to the heterogeneous market. Other top OSATs like (Amkor, JCET, and SPIL) are also working on heterogeneous integration strategies. Amkor recently delivered the industry’s first package Assembly Design Kit to speed up accurate design and verification of heterogeneous integration package. Likewise, JCET and SPIL have also ramped up efforts for heterogeneous integration. Intel already has many in-house solutions for heterogeneous integration. Intel recently also won the State-of-the-Art Heterogeneous Integration Prototype (SHIP) project from the U.S. Department of Defence.

Heterogeneous System Architecture Is Pushing Semiconductor Manufacturing To Innovate

EQUIPMENT: The equipment required to not only enable accurate testing but also assembling (Chiplets, Multi-Chip Multi-Die Modules, and SiP) without compromising on the specification puts pressure on the suppliers. The majority of the OSATs providing heterogeneous solutions will have to either upgrade their infrastructure or invest in new facilities. It is directly pushing semiconductor equipment providers to come up with new solutions.

COST: Eventually, aligning FAB to OSAT to equipment for heterogeneous integration requires CapEx. The added cost to design, fabricate, test, and assemble will increase the cost of development. It might directly affect the cost of goods sold, and semiconductor companies will have to come up with new techniques for viable product development to breakeven.

YIELD: All of the above factors eventually impacts the yield. The more complex the product is, the difficult it is to keep the yield high. Maintaining a high yield becomes a challenge due to the new way to test the system. This challenge is due to the complex fabrication and assembly process brings due to the integrated approach. It also means investing in new test hardware, probe cards, and automated test machines to handle heterogeneous testing.

The heterogeneous system architecture is pushing the manufacturing and the design semiconductor industry to new possibilities. It will be crucial to see how both the FAB and OSAT innovate and work in close collaboration with EDA and FAB-LESS/IDM houses to drive the era of heterogeneous integration.

December 20, 2020
The Race For AI Semiconductor Chips
Photo by david latorre romero on Unsplash

THE NEED FOR AI SEMICONDUCTOR CHIPS

Multi-Core Processor (MCP) or Chip Multi-Processor (CMP) revolutionized the computing industry. MCP/CMP came up with advanced execution and parallelism techniques. Software took the opportunity provided by the multiple processors fused into the single System-On-A-Chip (SoC).

MCP also provided the advantage of Out-of-Order Execution (OoOE), instructions-level parallelism (ILP), thread-level parallelism (TLP), and interleaved Simultaneous Multithreading (SMT), and allowed multiple applications to run on the same processor or multiple cores in the same SoC. Soon, the Single Instruction Stream, Single Data Stream (SISD) evolved into Multiple Instruction Streams, Multiple Data Streams (MIMD). MIMD gave a new experience to the data-intensive applications in the post-internet era.

The semiconductor and computing industry took advantage of MCP for over a decade by incorporating different core/processing units into a single SoC. Multi-Processor/Core System-On-A-Chip (MPSoC/MCSoC) became the heart of the new data and memory-intensive application. MPSoC/MCSoC starting to come up with dedicated processing blocks for data related to the graphic (GPU), digital (DSP), vector/vision (VPU), neural (NPU), and High-Bandwidth Memory (HBM).

The Artificial Intelligence System-On-A-Chip (AISoC) Is The Need Of The Future AI-Driven Workloads And Applications

The software computing industry is demanding data be processed faster than ever from semiconductor chips. Shrinking the transistor size further is not allowing the data and memory-intensive AI/ML/DL workloads to make the best of the MPSoC/MCSoC. Even though there are many opportunities to improve and innovate by proposing smarter data management techniques (cache, memory, and threading), MPSoC/MCSoC seems to have hit the memory wall, area wall, power wall, thermal wall, and performance wall. The data centers that should be shrinking in size and space due to the technology node advancement are instead becoming large by churning out massively distributed systems with a large number of MPSoC/MCSoC connected with large memory (NUMA).

The data-intensive, compute-intensive and memory-intensive artificial intelligence applications/workloads demand SoC that is:
- Low Cost:
  - Affordable to manufacture
- Efficient:
  - Improved performance-per-watt (PPW)
- High Parallelism:
  - Massively parallel execution without stalling
- Smart:
  - Ability to generate/store/predict models on the go that reside closer to the cores
- Zero Bottlenecks:
  - Processes the data without memory/interconnect bottlenecks with or without a co-processor
- Adaptive Software:
  - Ability to get programmed with minimal high-level programming effort and adapts on the go
- High-Speed Memory:
  - Provides a large amount of high bandwidth memory across different memory levels/hierarchies
- Technology Node:
  - Works efficiently irrespective of the advanced technology nodes used
The above eight-point feature is what will pay the way for Artificial Intelligence System-On-A-Chip (AISoC). These AISoC will be critical for the next generation of advanced solutions that will find use in the growing autonomous world. AISoC can be used in all devices and not only in the data centers. AISoC can speed up the fast-changing automotive to the satellite industry.

To cater to the growing demand for the semiconductor SoC chips for Artificial Intelligence) and to also balance the complexity, cost, and time to market, the semiconductor industry has already started to move away from general-purpose cores to specialized cores.

While the semiconductor industry is not labeling these new SoC as AISoC, but the features offered are of the AISoC world. Not all the AISoC solution adhere strictly to the eight-point features discussed above, but the solutions offered by different semiconductor companies is a step in the right direction.

Picture By Chetan Arvind Patil

THE STATUS OF AI SEMICONDUCTOR CHIPS RACE

Leadership in AI semiconductor chips is vital. Countries across the world are competing to bring the best homegrown solution to establish the lead. Governments are also funding the semiconductor chip business with the hope of leading the semiconductor race and mainly the AI solution one.

Apart from governments, companies across the globe are also racing against each other. From software giants to hardware leaders, all companies are investing zillions of time and money to come up with AI semiconductor chips out in the market.

The Artificial Intelligence System-On-A-Chip (AISoC) development is happening in two parallel worlds:
- Established companies building in-house AI semiconductor chips
- Startups providing a new architectural solution to drive AI semiconductor chips market
Below is the snapshot of the world’s top established companies racing against time to bring AISoC not only for their consumption but also for the market:

Alibaba: Alibaba competes directly with e-commerce giants and mainly Amazon. It provides web services similar to Amazon Web Services. To cater to enterprise needs, Alibaba last year launched Hanguang 800 is capable of processing 78,563 images per second. Alibaba introduced XuanTie 910 in 2019 provides 40% more performance than reference ISA RISC-V. These two AISoC are only a handful of examples. Alibaba’s DAMO Academy is continuously innovating and is going to launch much more surprising products in the AISoC domain.
Alphabet/Google: Alphabet’s Google arm has always been into hardware research and development. Google’s Brain and Hardware and Architecture team has been continuously providing solutions to make AI algorithms faster and smarter. Several AI-related hardware solutions have come out of Google. Google Cloud’s TPU is already becoming a benchmark for the AI industry. There are already many solutions that promise to improve the time to train networks using large data sets. Google is also pushing the envelope by taking the help of AI to design AI chips. With Pixel’s line of products, there is more room to innovate. In the coming years, Google will showcase innovative AISoC solutions.
Amazon: Amazon caters to more than 200 million visitors per month. Every visitor provides Amazon business and also the data on his/her shopping behavior. To process and make use of such unique data and to also provide enterprises the efficient web services, Amazon has been investing in AI-driven chips for a long time. Amazon’s Inferentia is the first step towards conquering the server market that is AISoC powered. The growing Alexa line of products pushed Amazon to in-house AISoC development and the results are already been seen in form of smarter voice-assisted devices.
AMD: AMD is another established semiconductor company with AISoC products. With AMD Instinct and AMD EPYC line of products, AMD has been steadily growing its market share in AI-enabled devices. AMD is also making most of the semiconductor chiplets technology to bring more innovation at the silicon level. Its acquisition of Xilinx is only going to help bring more AISoC solutions to the market. AMD is not deep into the mobile space, but they can certainly take advantage of the growing gaming industry to compensate. AMD CDNA is also another breakthrough architecture design to speed up high-performance computing.
Apple: With the launch of M1, Apple has shown the world its next target is going to be more in-house Macbook and iPhone/iPad processors. M1 has a NPU that allows faster predictive actions for its users. Apple is also planning to launch X-Reality products, which will require elegant AISoC, for which Apple has already started the work.
ARM: ARM IP has been critical for the smartphone industry and has taken the lead in providing AI-powered chips for mobile and also the data centers. With TOP500 won by ARM-powered supercomputers, ARM is ready to come up with more AISoC solution. Smart homes, wearables, and smartphones will see a massive use of AI Chip that will be powered by ARM. With Apple going all-in for ARM processors, and it will also help ARM innovate on the AISoC front.
Baidu: Baidu is a giant in China and competes worldwide with Google, Amazon, and Alibaba. Baidu showcased the Kunlun AI processor during HotChips 2020 that is designed and produced in collaboration with Samsung Electronics. Kunlun is capable of catering to diverse AI workloads and claims to have three times more performance than NVIDIA AISoCs. It will be interesting to see how Baidu goes all in-house with AISoC designs.
Facebook: There are 2.7 billion people that used Facebook every month. To serve the growing requests, Facebook has been developing in-house silicon that takes advantage of AI to provide faster training. Zion, Kings Canyon, and Mount Shasta are three major AISoC that Facebook has innovated to run its hardware infrastructure efficiently. It has ramped up its effort to develop more in-house AISoC, and the results will be out in the coming years.
Huawei/HiSilicon: A subsidiary of Huawei, HiSilicion has been innovating fast to cater not only to the AI smartphone and data center market in China but also in the majority of the developing nations. The Kirin and Ascend line of products has done wonders for Huawei devices. Huawei has also launched AISoC for data centers. It will be vital to see how Huawei and HiSilicon innovate in the next few years and expand their AISoC portfolio.
IBM: IBM has been a quiet leader in smart technologies. Watson has done wonders for the AI industry and also pushed other companies to innovate faster. IBM has innovated to accelerate DNN training with the help of CMOS and new AI Cores. IBM has been focusing on Analog and Digital AI cores that enable dynamic and hybrid cloud systems. IBM is one of the few companies that not only provide AISoC based solutions but also innovates at the transistor level. The combination of two allows it to provide more efficient AI solutions than others.
Imagination Technologies: Imagination Technologies has also ramped up its AI chip efforts. Recently, it launched a new AI-powered BXT series of chips for data centers. PowerVR backed line of products have helped Imagination Technologies establish its foot in the vision processing domain. PowerVR solution combined with Neural Network Accelerators (NNA) is unleashing new ways to process vision data and will also enable new AISoC.
Intel Corporation: Intel is a leader in the server and data center SoC market. Even though it is getting stiff competition from other vendors, Intel has been able to provide the industry with breakthrough AI chips. Even though the Nirvana series of AI chips did not work out as planned, it has big planes with Habana’s line of products. The manufacturing capability of Intel allows it to ensure that there is always a new way to design and manufacture AISoC. Intel Xeon’s line of products has also shown how the AI world how smaller SoCs are capable of running workloads on high bandwidth memory. With shrinking transistor size and Intel’s plan to move beyond 7nm, there will be elegant AISoC coming out.
Infineon Technologies: Infineon Technologies is going big in the AI Chip domain. It has established an AI development center in Singapore and also has a series of MCU designed with AI in mind. Low-cost MCU running with AI capability is the perfect solution for portable smart devices like cameras, drones, and smart speakers. AISoC with inbuilt MCU is another avenue Infineon is capable of exploring.
Marvell Technology Group: Marvell has launched a series of ASIC-based accelerators to cater to the AI data demand. The custom ASIC solutions used high speed interconnects and innovative packaging to optimize performance and cost. On top, Marvell has a strong collaboration with TSMC to provide 5/7/14 nm AI ASIC that allows it to pitch a wide range of portfolio to the growing AISoC market.
MediaTek: MediaTek’s Helio line of AI chips for edge computing on the go. It is also planning to use the solution for the 5G market. Apart from the hardware products, MediTek also provides hardware-oriented design solutions like NeuroPilot to make the most of its AISoC with AI Processing Unit (APU).
Microsoft: Microsoft hardware division has provided many AISoC solutions to the market. Project Brainway is another such solution that allows the use of FPGA and ASIC to speed up the training. Microsoft also has plans to develop a tiny AI chip in collaboration with Sony. It may very well pave a new way for nanoelectronics well beyond what is available now.
NVIDIA: GPUs have single-handedly accelerated the growth of AI research. NVIDIA has been one of the leaders that showcased how to train the data set faster using GPU architecture. Apart from catering to the data centers, NVIDIA also provides AI-enabled SoCs for smart cars. A few months back, NVIDIA also unveiled cost-efficient A100 architecture for the industry. With ARM’s acquisition, NVIDIA is on track to bring AI to the low power AISoC soon.
NXP Semiconductors: NXP has several MCUs and MPUs optimized for AI applications and targeted for the automotive and smart industry. NXP’s i.MX series provides ML and DL optimized solutions. With growing semiconductor cost in automotive, NXP is stand to get the advantage with is a wide range of AI-enabled AISoC chip solutions.
Qualcomm: As mobile AI is growing, Qualcomm is taking advantage of it by providing On-Device AI accelerators. Qualcomm has also taken steps towards a cloud AI Chip solution. It launched Cloud AI 100 chips to showcase its new architecture design for AI and data centers. Stronghold on mobile business with already out AI chips, Qualcomm can spring a surprise and enable new data centers that are not only AI-enabled but are also low-power and efficient AISoC.
Samsung: Samsung has fingers in many pies. From the design of chips in-house to the capability of manufacturing chips for its products and the world. Like Qualcomm, Samsung has been pushing for an AI chip to enabled On-Device AI. It has also collaborated with Baidu to develop a server-class of AISoC. The advantage of owning a foundry allows Samsung to innovate end-to-end and will be vital to see if it goes in the data centers’ AISoC chip design and development.
Tesla: Tesla already has the smartest AI-enabled cars out in the market. It has already designed in-house an AI chip to cater to Tesla’s growing need to provide more advanced and safe autonomous car driving solutions. Rumours says that Tesla would do away with cars and focus on an AISoC solution that can make any vehicle an autonomous one. Whether or not it will end up happening, Tesla’s AISoC will push the innovation around the self-driving car.
Texas Instruments: Like NXP and Infineon, TI is also providing Edge AI chips that cater to the 5G market. TI’s manufacturing capability fueled with low power techniques is going to provide a way forward to the industry on how to innovate AISoC with low power consumption.

The above summary shows how established companies are innovating and launching AISoC. The cost to establish a FAB-LESS semiconductor startup has gone down. The advanced EDA tools provide the ability to test ideas in the shortest possible time. RISC-V open ISA is also helping innovate without investing in royalty based ISA.

All this has lead to an increase in the number of FAB-LESS semiconductor startups that are coming up with new semiconductor chip designs and solutions to cater to the AISoC market. These startups have already got traction and some are even collaborating with established companies to test the solutions.

Below is the list of some of the top startups coming up with silicon level technology to drive AISoC design:

AlphaICs: AlphaICs is focusing on Edge AI and has designed an AI Processor that finds application as both the mobile and the data center solution. AlphaICs call their AISoC as Real AI Processor (RAP)
Alphawave: Alphawave provides Digital Signal Processor (DSP) solutions that are suited for high-speed performance and are low on power consumption. DSP provides audio/video processing and with Alphawave’s AppolloCORE IP semiconductor companies can build AISoC with an onboard accelerator. Alphawave was also the winner of TSMC’s Awards for Excellence in Accelerating Silicon Innovation
Blaize: Blaize is another startup providing Edge AI solution that is built for AI workload. Blaize’s Graph Streaming Processor (GSP) is a power-efficient and adaptable core that caters to AI, ML and DL need on the fly.
Cambricon Technologies: Cambricon used to provide processors to Huawei before it began its own in-house silicon design house HiSilicon. Since then, Cambricon has developed several general markets AI products catering to mobile and cloud. Their Cloud AI platform provides flexibility and adaptability. With more than 100 million smartphones and servers powered by Cambricon, it is going to be vital for the AISoC world.
Cerebras Systems: Cerebras uses Wafer-Scale Engine technology to deliver a supersonic deep learning experience. It is benchmarked to be 1000 times faster than a GPU. Cerebras unique interconnects, memory, and package technology is pitched to break many records in computing shortly.
EdgeQ: EdgeQ is taking a different approach to Edge and 5G by fusing both into a single AI-powered chip. This will massively off-load the tas from data centers to Edge Computing. With 5G rollout already in progress worldwide, the solution is at the right time for the right market.
GrAI Matter Labs: GrAI Matter is targeting robotics, X-Reality, and the drone market by providing Edge AI Processor. The solution provided by GeAI Matter has ultra-low latency and is low power, two features critical for Edge processing.
Graphcore: Graphcore has accelerator products that cater to machine learning and artificial intelligence by leverage the proprietory Intelligence Processing Unit (IPU) technology.
Groq: Groq leverages Tensor Streaming Processor (TPU) to provide small programming cores that are packed in a tiny package with high-speed memory and performance fast operations.
Hailo: Hailo is one more startup focusing on Edge AI. Hailo claims its Hailo-8 Edge AI processor can provide 26 tera-operations per second (TOPS) without comprising the area and power efficiency.
Horizon Robotics: Journey and Sunrise processor architecture from Horizon Robotics is designed to provide an AI-enabled Brain Processing Unit (BPU). Journey BPU is designed for the automotive industry, while Sunrise is for the IoT market.
Kneron: Kneron provides Edge AI solution and plans to take on Google and others with its AI-enabled chip. Kneron claims its KL720 AI SoC has the highest performance to power ratio in the market.
Lightelligence: Lightelligence is taking a photonics approach to solving AI processing problems. It has already released an optical AI accelerator but yet to see mass production for the market needs.
Lightmatter: In the same domain as Lightelligence, Lightmatter also plans to use electronics, photonics, and algorithms to provide processor and interconnect that is faster and more efficient than traditional AISoC.
Luminous Computing: Still in stealth mode, Luminous also plans to leverage photonics to speed up A workload training.
Mythic AI: Mythic uses Intelligence Processing Units (IPUs) to provide power-efficient, performance-oriented, and cost-efficient AISoC. Mythic Analog Matrix Processor is already available to order and will find use in Edge AI.
NUVIA: NUVIA is a stealth mode startup focusing on ARM-powered CPUs to drive AI workload. More details about its architecture are yet to be known.
SambaNova Systems: SamaNova is another startup that uses Reconfigurable Dataflow Unit (RDU) to enable new models without going into the algorithm complexity. SambaNova’s Cardinal SN10 is designed to eliminate constant data caching and excess data movement, something the majority of the SoC today suffers from.
SiMa.ai: SiMa wants to make greener low-power AISoC for Edge AI. It is yet to share product details. SiMa plans to launch new silicon early next year.
SimpleMachines: To accelerate AI/ML/DL application performance, SimpleMachines leverages Composable Computing. Simple Machines AISoC solution enables flexible and powerful real-time computation.
Synthara AG: Synthara leverages RISC-V ISA to provide ultra-low power ASIC for Edge AI.
Syntiant: Syntiant provides an ultra-low power AI processing solution for any battery-powered device, from earbuds to laptops. Syntiant Neural Decision Processors™ (NDP10x) is a tiny silicon that is always-on.
Tenstorrent: Tenstorent Grayskull AISoC fast AI interference to enable accurate and faster prediction on the go. It is expected to into production soon.
Wave Computing: Wave wants to accelerate AI computing with the help of MIPS architecture. The M-Class product from Wave Computing provides AISoC using MIPS architecture for IoT and smart devices.

Both established companies and startups are showcasing the world’s new way to design chips and drive data processing. All this is making software development, training, testing, and data analytics faster. The AISoC from all these vendors is also providing avenues for low-cost AI-powered mobile and data centers.

However, there are several challenges ahead.

Picture By Chetan Arvind Patil

THE CHALLENGES AHEAD FOR AI SEMICONDUCTOR CHIPS

The majority of the challenges the AISoC face are still the same old problems faced by general-purpose CPU and GPU as the technology at the silicon level advanced. The new AISoC solution from both the established companies and startups are eventually going to hit with these challenges.

Cost: Designing and establishing AISoC proof-of-concept using the software simulator demands resource and pushes the cost of development from FAB to OSAT. The cost of owning smartphones and running data centers is already high. On top of it, any new solution with AI-power will add cost to the customer. The technology node required to enable a high number of processing units to speed up the training and inference is eventually going to cost money. AISoC vendors need to balance the cost of manufacturing in order to breakeven the market. On top of all this, the amount of competition in developing new AISoC means time to market is vital than ever.

Bottleneck: The reason to move away from general-purpose CPU and GPU was memory and interconnect bottleneck. There are few startups listed above that are trying to remove these bottlenecks. However, with the speed with which new AI-workload are getting generated, there is a high chance that bottlenecks will still exist. It will be vital to ensure that the new type of AISoC that both the established companies and startups are envisioning does not have any bottlenecks.

Bandwidth: Bringing the data closer to the processing units (any type) is the key to processing AI data faster. However, for such a task high-speed memory with large bandwidth is required. The new AISoC are incorporating new processing units like RAP, GSP, TSP, BPU, AMP, RDU, NDP, and so on, but there is no clear strategy and details on how the data communication bandwidth is improved. May be such details are proprietary.

Programming: In the end, any AISoC cannot process the data efficiently if the workload is not optimized for the target architecture. While few AISoC is pitching their products as no need to change the data or framework before running it on their architecture, however, the reality is that every architecture ends up needing some or other form of optimization. All this adds to the time to develop data solutions.

Manufacturing: As the new AISoCs come out in the market, many of these will end up using advanced nodes beyond 7nm to provide high speed. Advanced packaging technology also is required to operate the AISoC within the thermal budget. Both the complex technology node and package technology will drive a high manufacturing cost. Apart from this, balancing yield and cost will be essential to ensure AISoC development is viable.

Power Consumption: AISoC requires zillions of transistors that require faster cooling. The majority of the AISoC can do with liquid cooling but when such AISoC is connected together to form data centers then the cost to run data centers goes high. Hopefully, greener technologies will be able to run such data centers. However, the AISoC will get challenged to overcome the area, power, and thermal wall.

No matter what, AISoC in coming years is going to be the semiconductor domain that will innovate and provide elegant semiconductor solutions that will challenge the end-to-end semiconductor design and manufacturing.
December 13, 2020
The Need For Semiconductor As A Service

Photo by Laura Ockel on Unsplash

THE SEMICONDUCTOR AS A SERVICE

The software industry has adapted to the demand of business and consumers by changing the licensing and product delivery model over the last three decades. The post-1990 saw standalone one-time fee-based software with no incremental feature updates except security-related and termed as the pay and use model. Then post-2000, with the proliferation of the internet, the software license model moved to pay over month/year and also came with features and security updates. The software industry termed it as Software-As-A-Service Model. Post-2010, the software industry adapted to the changing business and applied the licensing model from software to platform, which came not only with features and security updates for the software itself but also the platform the software will run on. It has allowed software developers to provide more over the top services.

In comparison to the software industry, the hardware industry (mainly the semiconductor industry) has not adopted the product delivery model. It has been constant and driven by build and ship, with no ability to provide new hardware features on the go. If there are security flaws in the hardware, then those are suppressed by an Over-The-Air (OTA) update. Consumer and business buying the piece of silicon get locked in with the product. It is also not easy to provide new features at the silicon level. On top, the majority of the products shipped by the semiconductor industry end up getting used differently based on the hardware company’s need.

The semiconductor products (from CPUs to NPUs to GPUs to ASICs to FPGAs to DSPs to Mixed/Analog/Digital devices) have a long design and manufacturing cycle. It also means a long-term vision of the future market needs and then aligning the investment in the design to the manufacturing process accordingly. As per the market demand, semiconductor products need to be more adaptable with in-built features that are more relevant a few years down the line and can be activated post-production.

Semiconductor-As-A-Service Is Possible Today Than Ever Due To The Shrinking Transistor Size That Allows More Silicon Features To Be Built-In Today For The Future Needs.

The approximate life of a smartphone is anywhere between three to five years. However, the majority of companies stop providing critical software updates that make the smartphones redundant. The launch of new smartphones with new silicon and software grabs consumer’s attention and they end up buying a new smartphone with the latest silicon features.

Imagine, having adaptable silicon with features built-in that can be unlocked a few years later and thus making the hardware as new as the software? Either vendors or consumers can decide which silicon features should be activated and how it helps the device performance. Such a process will allow the semiconductor industry to deliver silicon services under Semiconductor-As-A-Service model.

Semiconductor-As-A-Service – A product delivery business model for the semiconductor industry which allows silicon design and manufacturing with in-built silicon features that can be unlocked in the future as the market demand and software requirements align. For example – More graphics for new gaming applications. These silicon features can be enabled with the help of software updates and require a subscription or one-time payment license. The list of features can be endless, from more cache memory to DRAM memory to extra processing cores to additional GPU for gaming applications to secondary cellular (perhaps 6G) antenna. The shrinking transistor size and growth of heterogeneous integration as a More-Than-Moore (MTM) solution makes such features in silicon possible. Silicon area with extra features can reside inside the smartphone launched in 2020 as an inbuilt hidden feature with the option to enable in 2022 as long as consumers are willing to pay. Such service can also be bundled with software features wherein the smartphone manufacturers can tie the new feature like extra memory or storage.

Picture By Chetan Arvind Patil

THE PROCESS OF SEMICONDUCTOR AS A SERVICE

Semiconductor-As-A-Service implementation can unlock a plethora of opportunities not only for the semiconductor industry but also for the software industry. However, implementing Semiconductor-As-A-Service requires a specific process to be followed from designing to manufacturing. It also requires the semiconductor industry to take risks by providing advanced technology node use today rather than a few years down. Using advanced technology is the key to fitting more silicon features that can be unlocked post-production as it allows more silicon in the smallest possible area as this helps in providing more features at the transistor level.

Semiconductor-As-A-Service Process:

Identify Future Software Needs – These software features should be those that become bottlenecks for consumers. It can be from understanding whether the consumers will need more memory than the product has been shipped with so that with the growing data-driven application enabling an extra memory at the silicon level can cater to the software demand. The same goes for CPUs and GPUs for processing power.
Design Silicon With In-Built Hidden Features – Post identification of future software needs, packing the silicon with features that get unlocked in the future. The majority of these features will reside inside the System-On-A-Chip (SoC), as the active components are the ones that can provide more benefits of service-based features than passive components. Usage of advanced technology node is key to enabling such silicon level features.
Ability To Enable The In-Built Hidden Silicon Features – Incorporating the in-built hidden silicon feature requires not only designing it with secure memory to store keys to activate features but also requires a secure manufacturing process. The secure way of design and manufacturing ensures that there are no security flaws that can be exploited by hackers.
Innovative Manufacturing And Packaging – The critical piece of the Semiconductor-As-A-Service process is to ensure that the manufacturing flow and the packaging technology use advanced techniques to consider the effects when more silicon area is activated. Activating new features (more memory or processing capability) can have significant power and thermal effect.
Product Cost: Planting more silicon with the expectation that it will get used in the future under a pay-as-use service is a business risk. It is vital to price such products so that the design and manufacturing costs invested gets recovered even when in-built hidden features do not get utilized.

Above are the five key process steps that lay the foundation of Semiconductor-As-A-Service. It has the potential to make the silicon more adaptive. It will require massive research and development before the industry can use it as a real-world solution.

Picture By Chetan Arvind Patil

THE NEAR-TERM IMPACT OF SEMICONDUCTOR AS A SERVICE

If Semiconductor-As-A-Service is implemented and widely used, then it has the potential to transform the computing industry.

The ability to enable an extra layer of processing power on the go provides a new way to process data. With 3.5 billion 5G subscribers by 2026, the data consumption will skyrocket, and having silicon with in-built hidden features to cater to such high processing and memory demand will take computing to another level. Semiconductor-As-A-Service can also enable date centers and OEMs vendors with avenues to save cost and increase revenue by providing silicon level services.

Semiconductor-As-A-Service Provides Avenues To Put Future Silicon Technology In Today’s Silicon Area

FABs, FAB-LESS, IDMs, OSATs, and ATMPs will be able to use technology designed for future silicon today. It will help them understand its impact and usage before launching future silicon technology on a large scale. The semiconductor industry has already started embracing chiplets and heterogeneous computing. These two semiconductor and computing techniques can provide a perfect starting point where more silicon can be incorporated to use it in the future.

IP based semiconductor business is going to benefit the most as it will allow designers to incorporate more features that can be locked and unlocked as per the need. FAB-LESS companies will make more business by providing vital features as-a-service.

Semiconductor-As-A-Service also means every device out in the market is different than others as silicon features can be enabled and disabled to the consumer’s liking.

December 6, 2020
The Importance Of End-To-End Semiconductor Cluster Ecosystem

Photo by Laura Ockel on Unsplash

THE END-TO-END SEMICONDUCTOR CLUSTER ECOSYSTEM

The semiconductor industry is vital for high-tech advancement. From smartphones to satellites, a small piece of silicon forms the base for millions to trillions of data points. It is why worldwide, the semiconductor industry is a Key Enabling Technology (KET) provider. Semiconductor product development requires various resources to come together. With the growing demand for smart hardware, the need to develop these resources in-house is more critical than ever.

In semiconductors, no single country wants to be 100% reliant. Countries are ramping up in-country semiconductor design and manufacturing efforts.

The complexity of both the design and the manufacturing aspects of semiconductors makes it a tough business. It takes years and decades to come up with a turnkey ecosystem to drive in-country semiconductor design and manufacturing. The cutting-edge technology that is required to become self-reliant in semiconductor design and manufacturing demands a radically different approach than incentive-based schemes, which the majority of the governments provide.

The End-To-End Semiconductor Cluster Ecosystem Requires In-Country Development And Growth Of Semiconductor To Drive Key Enabling Technology

End-To-End Semiconductor Cluster Ecosystem: An end-to-end semiconductor design, manufacturing, and support ecosystem that enables seamless semiconductor product development. It requires different components of the semiconductor product development to be done in-country rather than globally. It drives in-country economic and talent development and is cost and time effective.

The End-To-End Semiconductor Cluster Ecosystem is what countries should focus on building to pitch themselves as a one-stop destination for all semiconductor solutions. However, it is easier said than done. The list of different types of resources and solutions that are required to develop a semiconductor cluster ecosystem is long. Depending upon the market size and focus area, countries can have a different smaller focused end-to-end semiconductor cluster ecosystem that has all the components of semiconductor design to manufacturing to customer delivery.

Picture By Chetan Arvind Patil

THE COMPONENTS OF THE END-TO-END SEMICONDUCTOR CLUSTER ECOSYSTEM

Creating a semiconductor cluster ecosystem is not easy. There are different components required to ensure that the environment supports the semiconductor business. Following are the major components of the semiconductor cluster ecosystem:

RESEARCH AND DEVELOPMENT

Research and Development (R&D) is key to both basic and applied science innovation. R&D requires the cooperation of government, academia, and industry. Given how complex semiconductor product development is (from technology node to packaging to power requirements), continuous and steady R&D spending is vital as it forms the base of the semiconductor cluster ecosystem.

According to the Semiconductor Industry Association, in 2019, the U.S. semiconductor industry R&D spending was 16.40% of total sales. Europe spending was 15.30% of total sales, while Taiwan, Japan, China, Korea spending was 10.30%, 8.40%, 8.30%, 7.70%, respectively. It clearly shows the importance of R&D spending and how it helps drive the leadership in the semiconductor business.

Countries wanting to implement the semiconductor cluster ecosystem need to increase R&D spending by collaborating with academia and industry, to drive advanced solutions for the market.

DESIGN (FAB-LESS/EDA/IDM):

Without the semiconductor design, there is no manufacturing. Countries around the globe are attracting businesses to design in-country. This requires setting up of FAB-LESS business which can drive the design of Analog, Digital, Processor, Memory, and Sensor-based products. To cater to the needs of FAB-LESS, EDA companies are required who can provide software-based tools to drive circuit to layout design, simulation, and validation. Apart from FAB-LESS, there are several IDMs (Intel, NXP, Marvell, etc.) which cater to the need of both the design and manufacturing aspect of the semiconductor.

The development of an in-country design ecosystem requires a talent pool. This demands universities with excellent infrastructure that can provide deep technical training required to drive gain expertise in semiconductor engineering.

MATERIAL:

No FAB or OSAT in the world produces the materials required to bring the silicon to life. Different chemicals, silicon, photomasks, gases, substrates, compounds, etc., are required to develop the wafers and packaged materials. There is a big dependency on specific countries and companies that provide such materials.

Semiconductor material development and procurements also mean a good understanding of the engineering aspect and as said it requires heavy R&D activities within the country where the materials eventually will get used, either by the FAB or the OSAT.

EQUIPMENT:

Semiconductor equipment is a billion-dollar market. Both FAB and OSAT require heavy machinery to process and assembly wafer silicon. ASML is the largest supplier in the world of lithography systems for the semiconductor industry apart from ASM, Applied Materials, and TEL. On the other hand, ADVANTEST, TEL, and Teradyne are the largest supplier of ATE-related equipment.

Both FAB and OSAT equipment are vital to ensure the materials and design eventually get made in the form of a product. A country with a stronghold on the semiconductor equipment manufacturing market is key to anything semiconductors.

FAB:

Fabrication of semiconductor devices requires dedicated facilities with large clean rooms. The investment to create such a facility is big and is the primary reason why there is only a handful of semiconductor FAB around the world. Even out of the existing FABs, not all are equipped to handle the advanced technology node that the semiconductor industry has ventured into.

TSMC, Intel, GLOBALFOUNDRIES, and Samsung Semiconductor are competing with each other to grab the opportunities presented by technology node 5nm and beyond. To make countries self-reliant in semiconductor, FAB play a vital role. It has pushed governments without any FAB facilities to provide incentives to set up new advanced FAB. However, setting up FAB also requires a supporting ecosystem, and this is why countries should focus on the cluster-based ecosystem that provides in-country end-to-end semiconductor solutions.

OSAT:

Outsourced Semiconductor Assembly And Test (OSAT) is as important as FAB. Packaging the products with the right technology enables long life. Testing every die on the wafer is vital to ensure there is no reliability or test escape. OSAT enables defect-free parts to the customer. They drive the back end of semiconductors, which in itself is a billion-dollar market.

Historically, OSATs have been located in the Asia Pacific and have been dependent on America and Europe due to the R&D and design lead these two continents hold. For a semiconductor cluster ecosystem, all the major components need to be catered to, not only specific ones. This is why OSAT is trying to get into FAB and is also investing in in-house design.

ATMP:

Assembly, Testing, Marking, and Packing (ATMP) is different than OSAT. OSATs take the bare wafer silicon and convert it into a packaged product, which is then shipped to the ATMP houses. ATMP receive packaged semiconductor products from different OSATs and then they assemble it together on a printed circuit board (PCB). All the semiconductor devices are connected to form a working computer system and clear marking details are put on the PCB to ensure traceability of devices. As the last step, the PCB is covered with an aluminum or plastic body before being shipped to the customer in a fancy box.

China is the leader in ATMP. India is another upcoming destination. Dell and Foxconn are the world’s largest ATMP houses. Having ATMP houses in-country provides economic development but at the same time negates the benefits when a country becomes 100% importer of semiconductor products. This is what has happened with India’s ATMP ecosystem.

MISCELLANEOUS:

Apart from all the major components, there are some crucial minor components that are also critical for the semiconductor cluster ecosystem. These include logistics, distribution, and enterprise-level software. Having delivery and development houses for these activities is also critical in ensuring an end-to-end semiconductor cluster ecosystem. Given these solutions are driven mostly by software in today’s day and age, the majority of countries have both development and R&D centers catering to the future of how to efficiently to logistics to distribution with the help of data and software.

SUMMARY: End-to-end semiconductor cluster ecosystem requires all of the above components to be in close proximity. However, as of today, there is not a single full end-to-end semiconductor cluster ecosystem in the world. The majority of the semiconductor cluster ecosystem has one or max three of the above components. Given the race between countries to attract the world’s best semiconductor business and talent, the focus on the end-to-end semiconductor cluster ecosystem needs to increase by leveraging facilities within the same location or country. Having more FABs and then relying on other countries for OSATs and ATMPs is never going make a single country the destination for all semiconductor needs, and that is what the majority of the countries in the last two to three years are trying to achieve. Unfortunately, that is not possible till an end-to-end semiconductor cluster ecosystem is built in-country.

Picture By Chetan Arvind Patil

THE ACTIVE SEMICONDUCTOR CLUSTER ECOSYSTEM

There are a handful of semiconductor cluster ecosystems located in different countries. However, these clusters do not cater to all the components discussed above. It will not be valid to call these centers a semiconductor cluster ecosystem, but it does show the importance of having one or more semiconductor components within vicinity.

Following are a few active semi semiconductor cluster ecosystem but not end-to-end:

Intel – Portland, Oregon, USA And Chandler, Arizona, USA: Intel has advanced FABs in Portland, Oregon, and Chandler, Arizona. There are two big universities in the proximity of these two FAB locations: Portland State University and Arizona State University. Cross-industry and academia collaboration at these two locations have to lead to the launch of several innovative semiconductor solutions. The exchange of talent for research activities has also helped. Intel’s presence in these two locations guided the formation of a semiconductor support environment that has helped its FAB execution. This is also the primary reason why TSMC has chosen Arizona as the destination of their next 5nm plant.

ASE Global – Kaohsiung, Taiwan: ASE Global has multiple OSAT facilities in Taiwan. Kaohsiung plant stands out due to the proximity to other package technology solution providers like Amkor. The competition has helped with the development and availability of the semiconductor raw materials required to smoothly operate an OSAT facility.

TSMC – Hsinchu, Taiwan: TSMC has several FABs around the globe with the majority of the FABs located in Hsinchu, and has helped TSMC develop an ecosystem that has allowed universities and OSAT nearby to thrive. Having OSAT and FAB in the same location also reduces the cost and time of product development.

Newport Wafer Fab – Newport, United Kingdom: Newport Wafer Fab is the latest addition to the semiconductor ecosystem and promises to be the one-stop FAB needs for the UK region. It has tied up with Cardiff University to enable future compound semiconductor development. Showcasing why having universities nearby helps.

Samsung – Gyeonggi, China: Samsung like TSMC has FABs in a different part of the world, with the majority located in Gyeonggi. China being home to both the OSAT and ATMP houses, has allowed Samsung to take advantage of the in-country ecosystem of semiconductors.

TAKE AWAY: Above examples show the importance of having one or more semiconductor cluster ecosystem components in proximity. Imagine having all the semiconductor components in one location and that too within a single country. The benefits from employment, development, and growth will be immense. Whether or not such an ecosystem will end up getting developed, but for sure, countries are racing to attract the best talent and semiconductor businesses to drive in-country semiconductor growth.

Picture By Chetan Arvind Patil

THE WAY FORWARD FOR END-TO-END SEMICONDUCTOR CLUSTER ECOSYSTEM

The semiconductor industry is going through massive critical changes. From mergers to acquisitions to new companies to new FABs, all this is shaking up the semiconductor business.

Traditionally, semiconductor design and manufacturing has been all about specific regions/countries in the world having a stronghold on either the design or manufacturing or equipment. Post-2020, the story is going to change. Majority of the country has already started chasing giants of the semiconductor industry to set up their designs for manufacturing houses.

Country With The End-To-End Semiconductor Cluster Ecosystem Will Lead In The Digital Technology World.

Governments need to develop their country as an end-to-end semiconductor cluster ecosystem, with a solution for every component of the semiconductor development cycle. Having one facility and not the other is only going to make the new facilities in the new country dependent on the old facilities in other countries.

The country that can create an end-to-end semiconductor cluster ecosystem is going to have an advantage over others and will lead the digital technology competition.

November 29, 2020
The BIG-5 Are Becoming Semiconductor Companies
Photo by İsmail Enes Ayhan on Unsplash

THE NEED TO PROCESS DATA

Internet usage is growing. Every new user generates a new type of data. The technology companies are always eager to process and understand new consumer behavior. It requires continuous research and development of both the software and the hardware.

Software development has advanced in the last two decades. It has kept pace with the need to understand and process data due to the development of software libraries and frameworks. The large amount of data that has generated post-2010 has helped the Deep Learning (DL), Machine Learning (ML), and Artificial Intelligence (AI) frameworks train networks, and that is now allowing new data to be processed faster and accurately.

Hardware is vital in ensuring that the processing of data using training and prediction frameworks occurs in the shortest time possible. It requires a massive amount of computing. The majority of the technology companies now rely on massive data centers equipped with advanced computer architectures.

BIG-5 – Facebook, Amazon, Apple, Microsoft, Google – FAAMG

To fully utilize computer architectures, an in-depth architecture-level understanding is required. It is not always possible to do so, as the data centers still run general-purpose computer architectures that do not cater to different types of data the big technology companies have to process.

The disconnect between the software, the hardware, and the data has promoted the need to move from General-Purpose SoC To Application-Specific SoC. Not all data companies are capable of setting up a dedicated team that can focus on in-house silicon development to come up with an Application-Specific SoC.

To overcome the reliance on semiconductor companies, the BIG-5 (FAAMG) technology companies have started (or have already developed) developing in-house SoC with the hope of opening up the silicon to different data companies around the world.

Picture By Chetan Arvind Patil

THE PUSH FOR IN-HOUSE SILICON

Two major factors drive the push to develop new computer architectures (silicon):
- Memory
- Parallel computing
Memory:
- Modern applications are becoming memory intensive and also demand faster computation. To process requests from memory-intensive applications in the shortest possible time, the data needs to reside closer to the processing unit.
- The time to bring the data from SSD to DRAM to Cache adds cycles and delays processing of the data. To overcome such bottleneck, semiconductor companies have implemented the following three techniques:
  - Cache Prefetching:
    
    Bring the data near the processing unit in advance to minimize cycle time
  - Increasing Level Of Cache:
    
    Add Leve-1 (L1), Level-2 (L2 – Shared), and Level-3 (L3 – Shared) small (KB/MB) cache memory to improve memory prefetching speed
  - Enable High Bandwidth Memory:
    
    An extra layer of large high-speed memory between Last Level Cache (LLC – Either L2 or L3) and DRAM to speed up prefetching
- All the above three techniques improved the response time of processing units. However, as the application data started growing, the cache and memory trashing became a new hurdle.
- Multiple processing units sharing the same level of memory started corrupting each other’s data to process the request faster. On top of all this, the lack of interconnect bandwidth added further bottlenecks.
Parallel Computing:
- Apart from being memory intensive, applications have become compute-intensive too. It prompted the need to have multiple processing units within the same SoC. Running multiple data requests on a single processing unit or two separate processing units provided a way to accomplish the task in the shorted possible time.
- The processing units still relied on the low-level memories to bring the data to be processed quickly. It means new SoC designing techniques that can allow the sharing of cache and high bandwidth memories in elegant ways without compromising on the need to add latency.
- Adding more processing units to a single SoC is not the solution. On top, the developers have to keep comping up with smart ways to distribute the data to multiple SoC to speed up the processing.
- Distributed computing is what the majority of the technology companies have adopted to ensure the data is processed quickly. It means a massive number of servers with thousands of SoC and a large amount of memory. Over time this has increased the cost of operating data centers.
Even though in the last decade, semiconductor companies have come up with unique computer architecture to cater to both memory and compute-intensive applications, it has not been enough to adopt the changing processing requirement of BIG-5.

The need to handle memory and parallel computing demand by modern workloads and applications efficiently at the architecture level has pushed BIG-5 to go for in-house silicon.

THE STATUS OF IN-HOUSE SILICON

BIG-5 has been gearing towards the development of adaptive computer architecture for data and operating systems.

Facebook:

Facebook started working on in-house silicon a couple of years back. With a growing user base across multiple platforms (Instagram, WhatsApp, Messenger), Facebook ramped up silicon effort last year.

They have a silicon team that is focusing on Application-Specific SoC development that not only caters to data centers but also portable devices like Oculus.

Amazon:

Amazon Web Services (AWS) is one of the leaders in cloud solutions. The desire to have customized SoC is vital to ensure the consumers and enterprises can make most of the wide range of computing services AWS provides.

Apart from AWS, Amazon’s growing range of Echo products is also pushing it to drive in-house silicon development. Amazon is betting big on ARM architecture to drive its silicon needs.

Apple:

Apple was always into silicon development. This year with the Apple M1 launch, they are making big bets on in-house silicon development that caters well to their need.

With Siri about to become the default search option on all the Apple devices, the need to have data-centric customized silicon will grow.

Microsoft:

Microsoft always had a keen interest in hardware. They already have a strong team of researchers focusing on hardware research. The Surface line of products has shown strong growth, and the SQ1 line of SoC establishes Microsoft’s goal of making Windows smoother to use on silicon.

Recently, Microsoft also announced a plan to develop Secure Chip with the help of semiconductor giant Intel and AMD.

Google:

Like Microsoft, Google also has a dedicated team that has heavily contributed to silicon development via different computer architecture domains. They have already announced plans to develop in-house silicon for Pixel and Chromebook devices.

A few years ago, Google showcased the world Tensor Processing Units (TPUs) to speed up the training of data set using the TensorFlow framework. Google’s latest data shows they have been successful in doing so.

Picture By Chetan Arvind Patil

THE POSSIBLE FUTURE SCENARIOS

BIG-5 is betting big on in-house silicon development. This requires not only years of planning and investments but also a dedicated semiconductor development team and flow chain. Going forward there are two possible scenarios that BIG-5 might take:

Scenario 1:

BIG-5 will keep collaborating with semiconductor companies (Intel, ARM, AMD, and Qualcomm) to design silicon for their products and data centers with strict control over features and the manufacturing process. It will enable BIG-5 to enter the in-house FAB-LESS business model.

Scenario 2:

BIG-5 will slowly move away from semiconductor companies and spin-off an in-house team with a full turnkey silicon development chain. It will be more like an IDM business model and might require the acquisition of existing semiconductor manufacturing units.

The probability of the second scenario occurring soon is unlikely. In a decade or so, BIG-5 may go big on the semiconductor business and try to keep themselves as in-house FAB-LESS silicon developers (while owning a piece of IDMs/FABs), which will ultimately play in the hands of FAB/Pure-Play Foundries like TSMC and GLOBALFOUNDRIES.

Whichever scenario ends up occurring, there will be exciting developments in computer architectures that will drive the semiconductor industry to new levels.
November 22, 2020
The Challenges And Way Forward For Computer Architecture In Semiconductor Industry
Photo by Luan Gjokaj on Unsplash

OVERVIEW

Computers are designed to provide real-time feedback to all user requests. To enable such real-time feedback, Central Processing Unit (CPU) is vital. CPU is also referred to as processing units or simply processors. These incredibly small semiconductor units are the brain of the computer and are capable of performing Millions/Billions of Instructions Per Second (MIPS/GIPS). High MIPS/GIPS, means faster data processing.

A lot of processing goes on inside these processing units. With the advancement of the technology nodes, more processing units are being glued together to form System-On-A-Chip (SoC). These SoCs have different individual units like GPU, DRAM, Neural Engine, Cache, HBM, ASIC accelerators, apart from the CPU itself.

It is incredibly difficult to design an SoC that has the best of two important worlds of computer architecture: Power and Performance.

Both in academia and the industry, Computer Architects (responsible for design and development of next-gen CPU/SoC) play a key role and are often presented with the challenge of understanding how to provide faster performance at the lowest power consumption possible. It is a difficult problem to solve.

The battery technology has not advanced at the speed at which SoC processing capability has. Shrinking technology node offers opportunities to computer architects to put more processing power, but at the same time, it also invites issues related to the thermal and power budget.

All this has lead to semiconductor companies focusing on design challenges around the power and performance of the SoC.

CHALLENGES

Semiconductor industry has been focusing on two major SoC design challenges:
- Challenge 1: Efficient and low latency SoC design for portable devices
- Challenge 2: High throughput and performance oriented SoC for data center
Picture By Chetan Arvind Patil

Challenge 1:
- Portable:
  - Portable devices suffer from the constraint on the battery capacity. The battery capacity has been increasing mainly due to the shrinking board inside these devices due to the shirking transistor size.
  - This has allowed the OEMs to put more lithium-ion. However, to balance the form factor and portability, batteries cannot be scaled out forever. It is a challenge for OEMs to understand how to manage portability by balancing the battery size apart from making the computer system efficient with low latency.
- Efficiency And Low Latency
  - To tackle efficiency and low latency, innovative designs are coming out in the market with the ability to adapt the clock and voltage domain depending on the application being executed by the user. It is no more about how many cores are in the SoC, but more about how an application-specific core can provide a much better user experience than ever.
  - This has presented researchers with an interesting problem of improving the performance per watt (PPW). To improve PPW, researchers around the globe are taking different approaches around DVFS schemes, apart from improving transistor level techniques.
  - Frequency and voltage level scaling also has a direct impact on the response time. Processing units like CPU are designed to provide low latency so that all the request coming in, can be catered to in real-time.
  - Improving efficiency without compromising on the latency is still a big challenge for the computer architects.
Challenge 2:
- Data Center:
  - On the opposite pole, data centers are designed to be compute-intensive. The SoC required to cater data center has exactly the opposite need compared to portable devices. As companies become data aggregators, the analysis requires dedicated hardware that provides streamlined computation of the data on the go.
  - This is prompting companies like Google, Facebook, and Amazon to come up with their silicon that understands the data being generated and how to swiftly analyze it on the go.
- Performance And High Throughput:
  - Designing custom SoC requires a fresh look and is drastically different than the block based approach. Improving throughput requires high speed interconnect to remove bottlenecks in data processing, else the performance will be affected.
  - In order to improve throughput, the data needs to reside near the computation block. This demands a new way to predict data to be used in order to bring in the cache or add a memory hirerachy with the help of MCDRAM.
The challenges are many and researchers around the globe are already working to provide elegant computer architectures both from academia and the industry.

WAY FORWARD

As the need of the application running on the computer systems is changing, so is the approach to designing SoC. Various examples from different companies show how the development of computer architecture is changing and will eventually help others come up with new computer architectures.

These new architecture designs are taking the traditional approach of computer architecture and providing a different way to tackle both memory and compute bottlenecks.

Cerebras came up with Wafer-Scale Engine (WSE), which is developed on the concept of fabricating full wafer as a single SoC. The performance data of WSE show a promising future of how computer architecture becomes more wafer-level designing than die level. WSE also takes different approach on interconnects by utilizing wafer scribe lines to transfer data which provide more bandwidth.

Fungible’s Data Processing Unit (DPU) architecture is another way forward that shows how SoC will be increasingly get designed for scale-out systems to handle massive data.

Picture By Chetan Arvind Patil

Google’s TPU and Amazon’s Inferentia shows how custom ASIC based SoC will become de-facto. Companies that generate a lot of data will try to run their center on in-house developed SoC.

Apple’s M1 launch showed how ARM will start eating the x86 market for energy-efficient portable devices. In few years, the integration will become more intuitive and might attract other x86 portable devices OEMs who have failed to take Windows on ARM to its true potential.

NVIDIA’s bid to acquire ARM shows that the future GPU will be designed with a blend of fusion technology that will combine ARM/CPU with GPU more than ever. This will allow data centers to improve on latency apart from focusing on throughput.

In the end, all these are promising development for the computer architecture community. Provides numerous opportunities to research and develop new ways to enable lower latency and higher throughput while balancing power consumption.
November 15, 2020
The Status Of Semiconductor Manufacturing In India
Photo by Laura Ockel on Unsplash

THE SEMICONDUCTOR STATUS IN INDIA

In 2020, the semiconductor industry got a lot of attention. From geo-politics to manufacturing advancement to mergers and acquisitions, the semiconductor industry has been all over the news.

This showed the world the growing need and dependency on semiconductor technology. This caught attention of the countries that are behind in the semiconductor fabrication, assembly, and testing. Specifically, India, which is a 100% importer of the semiconductor products/devices.

A country with 1.3+ Billion people, simply cannot afford to be lagging in the core semiconductor manufacturing technology, which provides the base for several technology products from smartphones to satellites.

There has been a push from the government to attract businesses all over the world to set up semiconductor fabrication, assembly, and testing facilities in India, by providing incentive based schemes. The reality is that much more is required than just the policies.

It is not that India does not have any semiconductor fabrication, assembly, and testing facilities.

All of the existing semiconductor fabrication facilities are owned and operated by the government of India for critical infrastructures needs like defense and space technology:
- Semi-Conductor Laboratory (SCL) is equipped to fabricate 180nm CMOS process on a 6-inch and 8-inch wafer. It also has packaging and testing capability.
- Bharat Electronics Limited (BEL) is another public sector company that has electronic manufacturing units but not semiconductor fabrication.
- Society For Integrated Circuit Technology And Applied Research (SITAR) has a 6-inch wafer processing capacity but the technology node is not advanced.
- IIT Bombay Nanofabrication Facility (IITBNF) is another lab that has the equipment to enable 2-inch, 4-inch, and 8-inch silicon wafers, which are primarily used for research and development activities only.
Few private companies are in semiconductor assembly and testing but not into fabrication:
- SPEL Semiconductor Limited is capable of providing turnkey post-fabrication testing and assembly solutions.
- ChipTest Engineering Private Limited is another (owned by SPEL’s parent company) assembly and testing solution provider with an office in Asia-Pacific apart from India.
- Tessolve Semiconductor also provides semiconductor testing and product engineering related services.
- Apart from above three, there are numerous private electronic assembly (different than semiconductor assembly and testing) service providers that do not fall either in the semiconductor fabrication, assembly, or testing domain.
Above data shows that the in-house semiconductor fabrication, assembly, and testing is yet to expand in India. There is a huge gap between supply and demand with respect to semiconductor manufacturing. On the other side, the semiconductor design industry in India is thriving with almost all of the top FAB-LESS to IDM (Design only) to ESDM design houses having R&D centers that are involved in niche semiconductor product design including PDK‘s for advanced nodes.

This leads to the question, what are the missing links?

Picture By Chetan Arvind Patil

THE SEMICONDUCTOR MISSING LINK IN INDIA

The semiconductor end-to-end manufacturing comprises of following types of semiconductor companies:
- FAB-LESS/IDM – Pre-Silicon semiconductor product design
- EDA – Pre-Silicon software tools and libraries to aid semiconductor product design
- FAB/Pure-Play Foundry/IDM/Equipment/Material/OSAT/ATMP – Post-Silicon semiconductor fabrication, assembly and testing
Pre-Silicon:

In terms of Pre-Silicon (FAB-LESS and EDA), India has an advantage over neighboring countries due to the active participation in critical product development happening in the various FAB-LESS/EDA companies. This is important as semiconductor industry in 2020 showed clear pathway that the future will have less IDM, and more FAB-LESS and FAB companies – two different segments. There are few IDM companies in India, but these only cater the designing aspect and not the manufacturing aspect of the semiconductor product development.

However, there are disadvantages too when moving from the designing to manufacturing phase of the product development.

India is becoming heavily dependent on Asia Pacific countries for semiconductor fabrication, assembly and testing. On top, it is not that Asia Pacific countries are far behind when it comes to FAB-LESS designing. With matured semiconductor manufacturing, taking lead in the designing space will not be difficult.

Post-Silicon:

The major missing link in India’s semiconductor industry falls in the Post-Silicon domain. Without a dedicated MINI/MEGA/GIGA foundry, India lacks far behind not only in the world but also in Asia.

Importing 100% of the semiconductor products and then simply assembling them is not viable from import/export business point of view and puts way India behind when it comes to competing in the technology domain. On top of this, the OSAT business yet to mature in India.

Semiconductor manufacturing is vital and with critical infrastructure going digital, it is more important than ever for countries to become self-reliant.

How To Fill The Gap:

India does not need a GIGA-FAB which costs ~$12 Billion to setup and takes years to break-even. A MINI/MEGA-FAB focusing on specific market and targeting higher technology nodes (140nm and 180nm), might be more viable option both in terms of the cost and time.

Indian government needs to involve or create a separate entity out of the active public semiconductor fabrication units like SCL and SITAR, and then form a partnership with the world renowned FAB companies. This is the best way to break the never ending chain of inviting private partners to setup the $ Billion facility that might never see the day.

Starting from backward (OSAT) and then going forward (FAB) is more business friendly for India.

To decrease the semiconductor manufacturing gap further, the policies like PLI and SPECS needs to be made more OSAT friendly, so that the global OSAT leaders can set up the manufacturing units in India on their own and link their business with the already present design houses and assembly/ATMP manufacturers.

If India wants to stand a chance in the semiconductor post-silicon industry, then the government and private companies needs to come together faster and start the work as soon possible. It takes half a decade to envision and build a Class 1 semiconductor manufacturing facility. The clock is ticking.
November 8, 2020
The Semiconductor Industry Shake Up
Photo by Jason Leung on Unsplash

THE SEMICONDUCTOR INDUSTRY STATUS

In 2020, the semiconductor industry has seen both negative and positive trends.

The first half of 2020 showed mostly the negative trend driven by the COVID-19 restrictions, as it lead to slower semiconductor production and increased inventory due to decreasing sales. The second half of 2020 has been more positive. The sales have gone up and production lines are 100% occupied, to cater the newly launched devices and products by vendors across the globe.

Apart from the steady increase in design, development, and production, merger/acquisition have gone up too. There have been some unexpected takeovers which are bound to have a strong impact in the long run.

The semiconductor industry from a product point of view can be divided into:
- CPU
- GPU
- SoC/MPSoC/RFSoC
- ASIC/FPGA/ASSP/ACAP
- Digital/Analog/Mixed
- Memory
Picture By Chetan Arvind Patil

The mergers and acquisitions that have occurred in 2020 have affected each of the above product domains.
All these acquisitions from the design have shaken up the semiconductor design industry. However, at the same time, it is turning out to be a boon for semiconductor manufacturing, as many IDMs plan on becoming FAB-LITE to focus more on the design aspect and increase share the mobile, AI, and data center market.

This raises the question of how the future of semiconductor design and manufacturing is going to be.

THE SEMICONDUCTOR INDUSTRY SHAKE UP AND FUTURE

Taking a look from the semiconductor design point of view, it is getting clear that companies are more focused on a specific product domain and want to dominate the market. To achieve this, companies are either creating new asset via acquisition or selling old asset that do not align with the goal.

Intel last year sold its smartphone modem business to Apple and this year Intel also decided to sell NAND memory business. This shows that Intel wants to focus on its strength of personal and data center computing. For sure, NVIDIA’s acquisition of ARM is concerning for Intel, given how much strength an established IP from ARM will give NVIDIA and also allow it to extend its business from GPUs to CPUs, and that too smartphone business which is not Intel’s primary domain.

On top, with AMD’s solid performance and acquisitions, the fight for smart computing is going to heat up. AMD (since 2009) and NVIDIA are FAB-LESS semiconductor companies. This allows AMD and NVIDIA to focus more on the design aspect and let external manufacturers take care of the manufacturing. This is a big advantage as semiconductor manufacturing is hard and takes a long time to perfect.

Picture By Chetan Arvind Patil

All these points towards a major shake-up that will occur in near future. The business mode; will change and semiconductor companies will go either:
- FAB-LESS
- FAB/Pure-Play Foundry
Competing in both the arena as a single entity is going to be challenging. Spinning off or selling part of the semiconductor manufacturing might be a more viable solution. Such shake-up will eventually end up creating more business for the semiconductor manufacturing companies and they will have to predict today and start planning on increasing the capacity (or acquisition) to keep the business running.

It will be vital for countries like India to take advantage of such market business change by coming out with policies that heavily incentivize semiconductor manufacturing.
November 1, 2020