AMD FireStream

AMD FireStream was AMD's brand name for their Radeon-based product line targeting stream processing and/or GPGPU in supercomputers. Originally developed by ATI Technologies around the Radeon X1900 XTX in 2006, the product line was previously branded as both ATI FireSTREAM and AMD Stream Processor.^[1] The AMD FireStream can also be used as a floating-point co-processor for offloading CPU calculations, which is part of the Torrenza initiative. The FireStream line has been discontinued since 2012, when GPGPU workloads were entirely folded into the AMD FirePro line.

Overview

The FireStream line is a series of add-on expansion cards released from 2006 to 2010, based on standard Radeon GPUs but designed to serve as a general-purpose co-processor, rather than rendering and outputting 3D graphics. Like the FireGL/FirePro line, they were given more memory and memory bandwidth, but the FireStream cards do not necessarily have video output ports. All support 32-bit single-precision floating point, and all but the first release support 64-bit double-precision. The line was partnered with new APIs to provide higher performance than existing OpenGL and Direct3D shader APIs could provide, beginning with Close to Metal, followed by OpenCL and the Stream Computing SDK, and eventually integrated into the APP SDK.

For highly parallel floating point math workloads, the cards can speed up large computations by more than 10 times; Folding@Home, the earliest and one of the most visible users of the GPGPU, obtained 20-40 times the CPU performance.^[2] Each pixel and vertex shader, or unified shader in later models, can perform arbitrary floating-point calculations.

History

Following the release of the Radeon R520 and GeForce G70 GPU cores with programmable shaders, the large floating-point throughput drew attention from academic and commercial groups, experimenting with using then for non-graphics work. The interest led ATI (and Nvidia) to create GPGPU products — able to calculating general purpose mathematical formulas in a massively parallel way — to process heavy calculations traditionally done on CPUs and specialized floating-point math co-processors. GPGPUs were projected to have immediate performance gains of a factor of 10 or more, over compared to contemporary multi-socket CPU-only calculation.

With the development of the high-performance X1900 XFX nearly finished, ATI based its first Stream Processor design on it, announcing it as the upcoming ATI FireSTREAM together with the new Close to Metal API at SIGGRAPH 2006.^[3] The core itself was mostly unchanged, except for doubling the onboard memory and bandwidth, similar to the FireGL V7350; new driver and software support made up most of the difference. Folding@home began using the X1900 for general computation, using a pre-release of version 6.5 of the ATI Catalyst driver, and reported 20-40x improvement in GPU over CPU.^[2] The first product was released in late 2006, rebranded as AMD Stream Processor after the merger with AMD.^[4]

The brand became AMD FireStream with the second generation of stream processors in 2007, based on the RV650 chip with new unified shaders and double precision support.^[5] Asynchronous DMA also improved performance by allowing a larger memory pool without the CPU's help. One model was released, the 9170, for the initial price of $1999. Plans included the development of a stream processor on an MXM module by 2008, for laptop computing,^[6] but was never released.

The third-generation quickly followed in 2008 with dramatic performance improvements from the RV770 core; the 9250 had nearly double the performance of the 9170, and became the first single-chip teraflop processor, despite dropping the price to under $1000.^[7] A faster sibling, the 9270, was released shortly after, for $1999.

In 2010 the final generation of FireStreams came out, the 9350 and 9370 cards, based on the Cypress chip featured in the HD 5800. This generation again doubled the performance relative to the previous, to 2 teraflops in the 9350 and 2.6 teraflops in the 9370,^[8] and was the first built from the ground up for OpenCL. This generation was also the only one to feature fully passive cooling, and active cooling was unavailable.

The Northern and Southern Islands generations were skipped, and in 2012, AMD announced that the new FirePro W (workstation) and S (server) series based on the new Graphics Core Next architecture would take the place of FireStream cards.^[9]

Models

Model	Video card equivalent	GPU Core	Threads max.	Core		Memory					Raw processing power (Floating-Point Operations per Second)		Peak TDP (watts)	Others
Model	Video card equivalent	GPU Core	Threads max.	SPUs	Clock (MHz)	Bandwidth (GiB/s)	Type	Bus width (bit)	Amount (MiB)	Clock (MHz)	FP32 GFLOPs	FP64 GFLOPs	Peak TDP (watts)	Others
Stream Processor	Radeon X1900 XTX	R580	512	48	600	83.2	GDDR3	256	1024	650	375^[10]	N/A	≤165
9170^[5]	Radeon HD 3870	RV670	?	64 (320)	800	51.2	GDDR3	256	2048	800	512	102.4 ^NB3^[11]	≤105
9250^[12]	Radeon HD 4850	RV770	16,384^[13]	160 (800)	625	63.5	GDDR3	256	1024	993	1000	200 ^NB3	≤150
9270^[14]	Radeon HD 4870	RV770	16,384^[13]	160 (800)	750	108.8	GDDR5	256	2048	850	1200	240 ^NB3	<160
9350	Radeon HD 5850	Cypress	31,744^[15]	288 (1440)	700	128	GDDR5	256	2048	1000	2016	403.2	150	codenamed Kestrel
9370	Radeon HD 5870	Cypress	31,744^[15]	320 (1600)	825	147.2	GDDR5	256	4096	1150	2640	528	225	codenamed Osprey

Notes:

NB3: Estimated to be one-fifth of the theoretical figure for single-precision operations.

Software

Software Development Kit

After abandoning their short-lived Close to Metal API, AMD focused on OpenCL. AMD first released its Stream Computing SDK (v1.0), in December 2007 under the AMD EULA, to be run on Windows XP.^[16] The SDK includes "Brook+", an AMD hardware optimized version of the Brook language developed by Stanford University, itself a variant of the ANSI C (C language), open-sourced and optimized for stream computing. The AMD Core Math Library (ACML) and AMD Performance Library (APL) with optimizations for the AMD FireStream and the COBRA video library (further renamed as "Accelerated Video Transcoding" or AVT) for video transcoding acceleration will also be included. Another important part of the SDK, the Compute Abstraction Layer (CAL), is a software development layer aimed for low-level access, through the CTM hardware interface, to the GPU architecture for performance tuning software written in various high-level programming languages.

In August 2011, AMD released version 2.5 of the ATI APP Software Development Kit,^[16] which includes support for OpenCL 1.1, a parallel computing language developed by the Khronos Group. The concept of compute shaders, officially called DirectCompute, in Microsoft's next generation API called DirectX 11 is already included in graphics drivers with DirectX 11 support.

AMD APP SDK

Main article: AMD APP SDK

Benchmarks

According to an AMD-demonstrated system^[17] with two dual-core AMD Opteron processors and two Radeon R600 GPU cores running on Microsoft Windows XP Professional, 1 teraflop (TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad Q9650 3.0 GHz processor at the time could achieve 48 GFLOPS.^[18]

In a demonstration of Kaspersky SafeStream anti-virus scanning that had been optimized for AMD stream processors, was able to scan 21 times faster with the R670 based acceleration than with search running entirely on an Opteron, in 2007.^[19]

Limitations

Recursive functions are not supported in Brook+ because all function calls are inlined at compile time. Using CAL, functions (recursive or otherwise) are supported to 32 levels.^[20]
Only bilinear texture filtering is supported; mipmapped textures and anisotropic filtering are not supported.
Functions cannot have a variable number of arguments. The same problem occurs for recursive functions.
Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully IEEE-754 compliant.
Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the kernel and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem.
The bus bandwidth and latency between the CPU and the GPU may become a bottleneck.

References

↑ AMD Press Release
1 2 Gasior, Geoff (October 16, 2006). "A closer look at Folding@home on the GPU". The Tech Report. Retrieved 2016-05-26.
↑ ATI SIGGRAPH 2006 Presentation (PDF) (Report). ATI Technologies.
↑ Valich, Theo (November 16, 2006). "ATI FireSTREAM AMD Stream board revealed". The Inquirer. Retrieved 2016-05-26.
1 2 "AMD Delivers First Stream Processor with Double Precision Floating Point Technology". AMD. November 8, 2007. Retrieved 2016-05-26.
↑ AMD WW HPC 2007 presentation (PDF) (Report). p. 37.
↑ "AMD Stream Processor First to Break 1 Teraflop Barrier". AMD. June 16, 2008. Retrieved 2016-05-26.
↑ "Newest AMD FireStream(TM) GPU Compute Accelerators Deliver Almost 2x Single and Double Precision Peak Performance and Performance Per Watt Over Last Generation". AMD. June 23, 2010. Retrieved 2016-05-26.
↑ Smith, Ryan (14 August 2012). "The AMD Firepro W9000 W8000 Review Part 1". Anandtech.com. Retrieved 28 June 2016.
↑ R580 shader core FLOPs
↑ "AMD's RV670 does double-precision at half the speed". Tigervision Media. 1 February 2008.
↑ AMD FireStream 9250 - Product page Archived May 13, 2010, at the Wayback Machine.
↑ "Entering the Golden Age of Heterogeneous Computing", Michael Mantor, Senior GPU Compute Architect / Fellow, AMD Graphics Product Group, slide 11 of 71
↑ AMD FireStream 9270 - Product page Archived February 16, 2010, at the Wayback Machine.
↑ "Heterogeneous Computing: OpenCL™ and the ATI Radeon™ HD 5870 (“Evergreen”) Architecture", Advanced Micro Devices, slide 56 of 80
1 2 3 AMD APP SDK download page and Stream Computing SDK EULA Archived March 6, 2009, at the Wayback Machine., retrieved December 29, 2007
↑ HardOCP report, retrieved July 17, 2007
↑ Intel microprocessor export compliance metrics
↑ Valich, Theo (September 12, 2007). "GPGPU drastically accelerates anti-virus software". The Inquirer. Retrieved 2016-05-26.
↑ AMD Intermediate Language Reference Guide, August 2008

External links

AMD graphics

Radeon-brand
List of GPUs (GPU features template) and List of APUs (APU features template)

Fixed pipeline

Unified shaders

TeraScale	HD 2000 HD 3000 HD 4000 HD 5000 HD 6000

Unified shaders & memory

GCN	HD 7000 HD 8000 Rx 200 Rx 300 400

Current technologies and software

Audio/Video acceleration

GPU technologies

Eyefinity (multi-monitor)
FreeSync (variable refresh rate)
PowerTune (power-saving)
CrossFireX (multi-GPU)
Hybrid Graphics
HyperMemory
HyperZ
Mantle
HSA

Software

Current	Radeon Software Mantle HD3D CodeXL GPU PerfStudio AMD APP SDK GPUOpen TressFX HLSL2GLSL

Obsolete	Catalyst Close to Metal CodeAnalyst

Other brands and products

Workstations & Supercomputers	AMD Radeon Pro FireGL/FirePro (certified OpenGL) FireMV (multi-monitor) FireStream (stream processing & GPGPU)

Consoles	Flipper (GameCube) Xenos (Xbox 360) Hollywood (Wii) AMD Liverpool (PlayStation 4) AMD Durango (Xbox One)

This article is issued from Wikipedia - version of the 11/14/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

AMD FireStream

Overview

History

Models

Software

Software Development Kit

AMD APP SDK

Benchmarks

Limitations

See also

References

External links