Arm and Intel Duel for the Universe

Transistors by the Watt … or the CISC vs RISC war

Arm and Intel Duel for the Known Universe


The question is often asked: “what is the difference between PC and ARM chips”… “are ARM chips as powerful as PC chips” and many glossy pages and posts are written about the “PC vs ARM wars.”  What consumers really mean is “when will the tablet be useful like a PC” and journalist mean is “Hot Tag Line”.  What is the real difference between these two architectures?

First let’s be clear there is no good guy here and no bad guy.  This is not a David and Goliath contest of biblical proportions.  This is Suits vs Suits.  Ecosystem vs Ecosystem, everyone here has access to all the same tools and techniques that the other does.  Each ecosystem has pluses and minuses and today in all likely hood most people use both of them.  Your phone uses an ARM chip and your PC uses well a PC chip (Intel or AMD).  Your local bank machine uses a PC and your car’s fuel injection uses an ARM chip (probably).  If you are still using the first generation of Nokia Communicator that uses a PC chip too.

 

Part 1: A little history

 

Once upon a time every electronics company made PCs.  That is Personal Computers. These were computers smaller than Micro Computers which were in turn smaller than Main Frames.  Most main frames were built by IBM and a few other large companies.  The explosion of many vendors (large and small) of Personal Computers followed. This was until Intel and Microsoft and agreed to form duopoly control of the market.  This went well for some years.  (See Wintel Wikipedia).  Yet at the same time Texas Instruments (TI) decided to leave the PC market and focus on an emerging technology called Digital Signal Processing (DSP). In 1995 Texas Interments released the first ARM processor with Digital Base Band and DSP (link to TI document).  This was an ARM7 the first embedded processor with some serious MIPS and hardware debug systems (Wikipedia JTAG).  These factors created a very popular processor.

This was the start of what would later become the TI OMAP series of ARM processors and the start of shift in computing to the mobile sector.  Not to mention the move to …. wait for it… digital mobile (cell) phones. (you were using a 120Mhz Pentium at the time.)

Skipping ahead to 2002 and the ARM9 series the TI processor contrasted sharply in architecture to the XScale inherited by Intel from DEC.  Lets take a look at some diagrams below.

OMAP 1510 MultCore (ARM +DSP)

Texas Interments played a neat trick here because rather than pushing every command through a general purpose processor all the voice and sound functions of the platform was diverted to the Digital Signal Processor (DSP).  On a phone conversion of Analog signals to Digital signals are a large part of the work load. Off loading these functions to a DSP that was faster and used less power made a lot of sense.

XScale PXA Processor (note internal bus)

 

The difficulty was that the programs (code) had to know that the DSP was there to address it and this took some very difficult coding to put in place.

 

DSP Process

 

However, this code allowed TI and in time Nokia to have better battery life and smaller devices.  It allowed new compressed forms of music (MP3s) to be played while using very little power from a battery.  Nokia comes to its strength here using its own OS and OMAP chips to create endless variations on ease of use and long battery life devices.

Intel by contrast choose and architecture that favored the simple programming and a central processor.  The wide internal bus ensured fast transfer of data between parts of the chip. This worked well at first with many XScale devices getting to the market quickly.

Pocket PC and XScale devices proliferated.  OEM’s liked the combination of XScale and the Microsoft OS because they worked well together.  However in order to compete Intel had to increase the clock speeds of its products in order to match the effective performance of the TI and chips with DSPs. This in turn led to more power consumption and shorter battery life.   In order to solve this Intel would have had to move to a smaller manufacturing process that would have in turn decrease the cost of the product but demanded large investments.  By the time the PXA27x came around 2004 only the fastest clocked chips (624Mhz) were competitive in the market.  In 2006 Intel sells the XScale line to Marvell.  (It should be noted that the PXA270 is very successful processor being produced from 2004 to the present . (Link)  There is something to be said for easy software implementation.)

The next generation of OMAP Chips the 24xx (2005) Series integrated video decoding and 3D processing (Power VR) cores on the chip.  This allowed the phone or device to decode video and provided some simple 3D effects and transparency.  In this design each core must individually addressed for a specific function.  This allows the ARM11 core to be relativity low power and low clock rate compared to the effective performance of the chip.   At the time the typical OMAP 2420 was running at 150MHz and the typical XScale was running at 300MHz to 600MHz.

 

OMAP 2420 Multi-Core Heterogeneous Computing 2005

 

History is long and boring, but there you have it the first popular Heterogeneous Computer. (Wiki Link) This chip includes an ARM 11 Processor, a Power VR Graphics solution, a MPEG Decoder and a DSP.  This is the base for today ARM architectures.

Let review we have two design paradigms here.  Where does this leave us in the ARM vs PC “chip war”.  One is general (Intel) and the other specific (OMAP): in the first case you just ask a question.  The processor will grind through it.  In the second case you have to be very specific about who you ask the question too or not much will happen.  You must dice your question finely not to miss the target hardware.  This is the same difference between ARM chips today and PC chips.

Part 2: Everyone has the same tools

Lets take a look at some of the common debate points and drill down some facts:

Architecture: Many people in the “ARM Camp” have the notion that ARM has some “magic pixie dust” that makes their chips more power efficient they call this (Reduced Instructions Set Architecture).  This was true some years ago but Intel saw the virtue in the magic pixie dust and stole it.  They now call this Intel Architecture (IA) or IA64 which means they borrowed something from AMD as well.  This is not a one way street, ARM is using design strategies that Intel pioneered in order to bump up performance in their A9 and A15 architectures.  As I mentioned above everyone has the same set of tools.

If there is no magic pixie dust to be found why do ARM processors use less power than PC processors?  Simple, they use fewer transistors.  The Intel Core i7-980X chip has 1.17 billion transistors, the i7 960 has 761,000,000 transistors, The Intel Atom N450 has 47,000,000 transistors, An ARM A9 Dual Core has around 26,000,000 transistors.  If we look at the computing industry over time the real change in power consumption comes with manufacturing process not with architecture.  Semi Conductor Fabrication (link)

Note: Process Node has much more influence on power consumption than architecture

The Nvidia Tegra 2 is a very interesting System on Chip (SoC), Nvidia puts the gate count at 260 million transistors.  The ARM A9 processors are said to take up 10% of that area thus 26 Million used above.  When you use all of those gates power goes up considerably.  Key to low power processors is “Power Gating” this is something that Intel is only starting to do on it processors, but ARM has done for a very long time.  The Atom Z670 is the first Intel chip to my knowledge that uses power gating.  Part of Intel’s difficulty is that this is not supported by Windows as yet.  Thus, consumers will not see the improvement in power usage that would with say Meego or Android.  Windows 8 is on the way to solve that but it is not here yet…

Power Gating - Thanks to Fujitsu for the Illustration (link)

Above you see that the parts of the silicon that are not in use at a give time can have their power block turned off.  This gives ARM Processors a very fine-grained control of power management that PC processors will only acquire in the future.  Below is an illustration of the blocks on the Tegra 2.  If you are listening to MP3s then you just use a low power ARM7 processor and an Audio Decoder.  Flash and 3D are rendered on the Graphics Processor.  Internet uses the two ARM A9 Cores.  Video is played back with the ARM 7 low power processor and the HD Video Decoder.  There is also HD video encoder just for compressing incoming video streams.  Mostly on iOS and Android devices you are only using a portion of the chip at a time (say listening to MP3′s) and this allows the rest to be power gated “off”.  Thus, you get 30 hours of MP3 playback because you are only using an ARM 7 and the Audio Decode portions of the chip.

Nvidia Tegra 2

 

Part 3: It’s the Software

All of these different processes using different parts of the chip make for “non-standard” implementation.  This is not like a PC architecture where you simply drop windows on top of your home built PC.  These optimized blocks of logic give ARM a performance advantage but they are difficult to program towards.  Every ARM vendor has its own set of features that need custom code to run.  The most popular Android Tablet to date the ASUS Transformer has had problems with its power management (link).  The recently abandoned HP WebOS Tablet had major difficulties with the software implementation on its Qualcomm processor (link).  Texas Instruments should be lauded for making efforts to get inexpensive development kits into the hands of “The Community” in the form of Beagle and Panda Boards. This has allowed TI to expand the base of software for its OMAP chip line.

Linus Torvalds recently had a few words to say about the state of the ARM (link)

“No concentrated effort to have a framework for things… since we try to support a lot of the ARM architecture, it’s been a painful thing for me to see, look at the x86 tree and ARM tree and it’s many times bigger. It’s not constrained by this nice platform thing, it just has random crap all over it. And it was getting to me.”
In the end he said, “I just snapped, and instead of running around naked with a chainsaw like I usually do, I started talking to people and a lot of people admitted it’s a problem.”
While Torvalds accepted that “a lot of people love to hate the PC,” the fact that Intel, AMD and hardware makers worked on building a common infrastructure “made it very efficient and easy to support”
 

It should be noted that all of this gives Apple a huge advantage in the ARM market.  First, it is not obliged to turn out numerous variations of devices.  It has several ARM devices (iPod, iPhone, iPad and Apple TV) all running iOS that are updated on a regular product cycle. Secondly, it has a culture of perfectionism that serves it well when implementing its OS on ARM devices.  Apple always strikes a careful balance between features and usability.

Part 4: PC Architecture

The PC Industry has used a different processor paradigm: It uses multiple identical cores that are general purpose in nature. This allows the processor to chew through identical blocks of code at great speed.  Prior to the Pentium 4 Intel and AMD had always used clock speed to increase the processor power of single core chips. The Pentium 4 Intel marked the point in processor development at which the increase in clock speed resulted in a power increase that was difficult to justify, let alone dissipate. (At levels over 130w it is difficult to get rid of the heat generated in a reliable manor, baring exotic solutions.) AMD and Intel have since been using multicore chips and incremental adjustments to the PC architecture to increase performance.

The software integration of these chips is simple.  Once and applications is “multi-threaded” (link and SMT link), that is it can use processes from all cores at the same time the performance of the application goes up by the number of cores it has at its disposal. Well that is the theory - basically this only holds true for rendering programs such as V-Ray or Maxwell.  However all multi-threaded applications show some improvement in performance.

Intel Nehalem 45nm - Four Identical Cores

Alternatively: the user can run multiple applications on multiple cores and the OS figures out where each process will occur.  This allows the user to quickly shift between tasks and has enabled a dramatic boost in office productivity.  Prior to the Pentium 4 the user might be able to have 2 applications open at the same time.  These applications were single threaded and the computer would have to cache data as it shifted between applications.  This resulted in a short delay while the computer worked.  Too many applications open at the same time could lead to very large delays in writing to the cache, or system instability, or both.

Anyway, the basic mantra of the PC has been General Purpose Processing “dump the code in and we grind it up”, no custom slicing and dicing as in ARM. That takes a gourmet chief.  Well PC’s are becoming a little more gourmet with the advent of “General Purpose Graphics Processing” GPGPU (link)  This is one of the places that AMD and Intel are stealing ARM’s “magic pixie dust”.  Multi-Cores can only carry increasing performance so far, thus AMD and Intel are being pushed into the “heterogeneous computing” model.  Presently they are focusing on the Graphics Processor.  This is very much like the inclusion of the DSP in the ARM Architecture.  In fact both DSP’s and PC Graphics Processors use similar ”Very Long Instruction Word” (link) programming.

Sandybridge 32nm - Four Cores + Graphics (image xtremehardware.it)

In the Intel Architecture the Graphic Processing Unit (GPU) takes up about 20% of the hardware.  There are a number of reasons this is such a small percentage of the die area.  Intel focuses on General Purpose Graphics and video decode.  That means very simple 3D support.  AMD uses about 50% of its die for GPU as it supports 3D graphics on its processors at a usable level for gaming.  In both cases it does not grow to be a greater part of the chip just yet because GPGPU computing is still very young and there is not too much you can do with it.  In ARM Processors the DSP supports the mobile broad band (GSM or CDMA).  This is something that users need and want, it saves battery power on your phone.  The most practical application at a consumer level for the GPU is encoding video, this done much faster the the CPU alone could manage.  This is something that both Nvidia and Intel have implemented.  Otherwise the software is best targeted at computational problems. Read scientific computing.

Thus we can sum up that ARM and PC systems use very different architectures.  However, these architectures are becoming more similar as they progress toward the now shared goal of more power and less energy.  In the near term ARM will remain dominate in the low power market and PC will retain the high performance market.  Each vendor is support forays into the others camp as they each see profits there.  Intel maintains a hold of the high profit margin PC market.  Yet ARM maintains dominance in the fast growth low power market.  Consumers just want devices that work… the real question here is what sort of work would you like to do.  (Your phone already has more processing power than was used to send the Apollo missions to the moon.)

Next time we will cover the interaction of software and hardware in the execution of “work”.

Stay Tuned…

 

 

Share