Ask the Experts - ARM Fellow Jem Davies Answers Your GPU Questionsby Anand Lal Shimpi on June 30, 2014 11:52 AM EST
- Posted in
- Ask the Experts
When we ran our Ask the Experts with ARM CPU guru Peter Greenhalgh some of you had GPU questions that went unanswered. A few weeks ago we set out to address the issue and ARM came back with Jem Davies to help. Jem is an ARM Fellow and VP of Technology in the Media Processing Division, and he's responsible for setting the GPU and video technology roadmaps for the company. Jem is also responsible for advanced product development as well as technical investigations of potential ARM acquisitions. Mr. Davies holds three patents in the fields of CPU and GPU design and got his bachelor's from the University of Cambridge.
If you've got any questions about ARM's Mali GPUs (anything on the roadmap at this point), the evolution of OpenGL, GPU compute applications, video/display processors, GPU power/performance, heterogeneous compute or more feel free to ask away in the comments. Jem himself will be answering in the comments section once we get a critical mass of questions.
Post Your CommentPlease log in or sign up to comment.
View All Comments
fbennion - Monday, June 30, 2014 - linkCould you expand on what use cases the GPUs are being used to accelerate? For example, are they being used to accelerate web browsing, or what use cases do you see them being used to accelerate in the future?
JemDavies - Tuesday, July 1, 2014 - linkTake a look at my responses to twotwo later in the thread
bengildenstein - Monday, June 30, 2014 - linkThank you very much for doing this. Here are my questions.
1) What is the die area of a Mali T760 core at 28nm?
2) What is the peak power consumption of a Mali T760 core at 695MHz and 28nm?
3) Why were SIMD 128-bit vector ALUs chosen over a series of scalar ALUs?
4) What are your plans regarding ray-tracing in hardware?
5) Will the Mali T760 support the Android Extension Pack?
6) Can the Mali GPUs (and available APIs) support developer accessible on-chip memory.
JemDavies - Monday, June 30, 2014 - linkThanks for your questions. I cannot address your first two questions as those are numbers we typically do not make public.
As far as your other questions go, happy to respond. In a blog I posted last year - http://community.arm.com/groups/arm-mali-graphics/... - graphics is a really computationally intensive problem. A lot of that graphics-specific computation consists of vector-intensive arithmetic, in particular, 3*3 and 4*4 vectors and arrays of floating-point (increasingly 32-bit) arithmetic.
The Midgard architecture (our GPUs from Mali-T600 series onwards) have some ALUs that work naturally on 128-bit vectors (that can be divided up as 2 64-bit, 4 32-bit or 8 16-bit floats, and 64, 32, 16 and 8 bit integers (we also have some scalar ALUs as well). All architectures chase the target of unit utilisation, and there are several approaches to this. We have taken the vector SIMD approach to try to gain good utilisation of those units. We have a heavily clock-gated arithmetic unit, so only active SIMD lanes have their clock enabled; and only active functional units have their clock enabled; and hence a scalar-operation-only instruction word will not enable the vector units.
Scalar warp architectures can perform better on code that does not have any vector code (either naturally vector or through a vectorising compiler). However, the big disadvantage of scalar warp architectures is that of divergent code. Whereas we effectively have a zero branch overhead in Midgard, warps struggle with branching code, as they cannot keep all lanes of the warp system occupied at the same time if not all warps are executing the same branch of divergent code (they effectively have to run each piece of code twice – one for each side of the branch). Other disadvantages are if the warps are accessing data that is not in the same cache line, and in the end, optimising for warp code can mean optimising with detailed understanding of the memory system, which can be more complex to get one’s head around than optimising for a more obvious construct like a vector architecture. In the end both types of architecture have advantages and disadvantages.
Looking at compute code (or at least that used in GPU Computing), if code can be vectorised, either by having naturally vector-style maths, or by unrolling loops and executing multiple iterations of the loop at the same time in vector units, then it should be very efficient on vector architectures such as Midgard. We tend to find that image processing contains a lot of code that can be parallelised through vectorisation.
Ray-tracing is an interesting question and a very fashionable one. When deployed in non-real-time renderers such as are used to generate Hollywood CGI-style graphics, the results can be awesome. However, the guys doing that don’t measure performance in frames per second, they measure in hours per frame (on supercomputers). Ray-tracing is a naturally very computationally intensive method of generating pictures. For a given computational budget (a given power/energy budget) you will usually get better-looking pictures using triangle/point/line rasterization methods. To be clear, we have no problem with ray-tracing, but we don’t see its adoption in real-time mobile graphics in the near future. There is also the problem that there are no real standards in this area, which means developer adoption is further hampered.
We will be supporting the Android Extension packs on all our Midgard GPUs.
As a tile-based renderer, we have tile-buffer memory inside the GPU (on-chip) and we are providing API extensions to access this and working with the industry to get standards agreed for this that are non-vendor-specific so that developers can adopt them with confidence. We have presented papers on this at conferences such as SIGGRAPH.
bengildenstein - Monday, June 30, 2014 - linkMany thanks! I greatly appreciate you having made time for my questions, and your appreciate your valuable insight.
twotwotwo - Monday, June 30, 2014 - linkAh, great answer and gets at my question about where y'all see GPU computing potentially being useful on your architecture.
Nintendo Maniac 64 - Monday, June 30, 2014 - linkRecently attention has been called to ARM's poor OpenGL drivers on mobile devices (ranked "Bad" by Dolphin devs), even when compared to Nvidia/AMD/Intel on Windows where OpenGL support is not a main focus of driver development.
Is there anything being done about these unsatisfactory OpenGL drivers?
Here's the Dolphin Devs' blogpost about OpenGL driver quality:
saratoga3 - Monday, June 30, 2014 - linkI would also be really interested to hear more about this. It seems like GPU drivers on ARM devices are serious source of frustration for developers, and given that they can do things like reproducibly crash devices from user mode, probably a fairly serious security risk as well.
Is ARM taking steps to address these problems?
JemDavies - Monday, June 30, 2014 - linkIn a post-PC world, in the sort of devices that our GPUs are built into, desktop OpenGL is of very limited relevance, and so we choose not to support OpenGL. We support OpenGL ES, OpenCL, Google’s RenderScript, and Microsoft’s DirectX.
Guspaz - Monday, June 30, 2014 - linkThe Dolphin blog post in general is calling out ARM for bad OpenGL ES 3 drivers, not bad desktop OpenGL drivers.