Thursday, October 30, 2008

Green Clusters

I while ago I ran the numbers on using embedded processors for clustered computation. I took it as read that they'd be no good for floating point calculations, but there are a lot of problems that only require integer arithmetic:

Green Clustering - N Whiteford

The question of power consumption at large data centres is of considerable interest. The most notable example of this is Google, where VP of operations Urs Holzle recently stated "power consumption is likely to become the most critical cost factor for data-centre budgets" [1]. Speculation on Googles future plans also indicate that rack density is a significant consideration [2]


A Standard IBM BladeCenter Chassis has 14 server bays and occupies 7Us of rack space (12in x 17.5in x 28in). Each bay may contain a dual processor 3.8GHz Intel blade [3] giving a total of 106.4GHz, or 15.2GHz per unit of rack space. This is representative of the highest rack density available at the present time.

Here we present a new clustering technology based around low powered embedded processors, which we believe will increase the overall GHz/Cm^2 and GHz/Watt.

Each node in such a cluster is a single embedded processor with associated storage and memory. A candidate node for such a cluster is the gumstix [4] Each device measures 80mm x 20mm x 6.3mm. Contains 16Mb of flash memory, 64Mb of ram and a 400MHz PXA255 ARM Processor. An onboard MMC slot is available on some models, and in the following scenario we shall assume this is populated with a 512Mb MMC Card. Each device consumes approximate 1W at peak utilisation

The approximate dimensions of a 1U rack are: 43mm high x 444mm width x 711mm depth. If laid flat a single layer of gumstix in this area would contain 176 devices. Heat dissipation issues aside, such a rack should easily contain 4 such layers, or 704 devices, providing 281GHz of processing power. However, it is not clear that the heat produced from such a system could be dissipation effectively.

The IBM BladeCenter Chassis previously mentioned is rated to consume 2000W, being a 7U device this gives 285W per Unit of rack space. We shall therefore limit each gumstix rack unit to this power consumption. This allows each rack to contain 280 gumstix nodes, allowing an additional 5W for interconnect and routing requirements.

280 gumstix nodes would result in a total of 112GHz of processing power, 17.5Gb of RAM, 4.4Gb of onboard flash and 140Gb of MMC flash storage. It is however the processing power that is of the most interest, the various memory capacities could be increased without significant difficulty.

When comparing the 112GHz of PXA255 ARM Processing power with the 15.2GHz of 64Bit Intel processing power we must be careful not to make any rash judgements as we are far from comparing like with like. The PXA255 for example has no FPU and therefore performance of floating point operations will be absmally slow. I also do not have access to a 3.8GHz processor, we therefore have to jump though a number of hoops when estimating the relative processing capabilities of these devices.

We know the gumstix has a processing power approximately 5times the processing power of a Pentium 90MHz [6] for integer operations. We shall not consider floating point performance, as this will be understandably poor, and for many applications (such as string searching) unnecessary.

A Dell XPS Pentium 90MHz was previously rated at 2.88 in SPECint95. [7]. We can therefore estimate the gumstix SPECint95 rating as 14.44. SPECint95 was retired in the year 2000 so we can not compare this directly to the rating of a 3.8 Intel Xeon. However under SPECint95 a 1.0GHz Athlon rates as 42.9, A slightly faster processor (Athlon 1.2GHz) rated at 458 under CINT2000. This allows us to approximately convert a SPECint95 rating to CINT2000 by multiplying by a factor of ten. Under CINT2000, the 3.8GHz Intel Xeon IBM eServer (hyperthreading disabled) rated at 1820. Enabling hyperthreading may double this value. This gives us relative rating of 3640 for the 3.8GHz Xeon and 144 for the PXA255 used in the gumstix.

We previously showed that 1U of IBM BladeServer rack space provides 15.2GHz of processing power, using the above rating this equates to 14560 under CINT2000. 1U of gumstix processing would provide a rating of 40320. These very rough calculations show that a gumstix cluster could provide in excess 2 and a half times the computational power of the best existing servers in the same density and at the same power consumption. Further more, as it maybe easier to dissipate the heat of gumstix clusters than of traditional compute clusters (due to the larger surface area over which the heat is produced) it maybe possible to double or triple the rack density stated. It may also be possible to reduce the power requirements of the gumstix cluster by reducing the operating voltage (figures shown are based on an operating voltage of 4.5V but maybe reduced to 3.6V).

No comments: