|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?注册
x
本帖最后由 cjsb37 于 2013-4-29 09:28 编辑
在DSP的开发中常常会用到汇编语言,编写高效的汇编程序就成了DSP工程师的日常工作,一个好的DSP工程师会为节省一个时钟周期而尽最大的努力。因为,在一些关键的程序里,节省每一个时钟周期将会带来很大的功效,有时甚至关系到项目的成败。用一句我们DSP同行的名言:“Every Cycle makes difference”
下面是对一段简单运算的分析,从这段简单的编程中我们可以看到,高效的编码与差的编码只有一步之遥。而DSP工程师的责任就是把所有差的编码变成高效的程序。
该文(与该论坛的其他文章一样)将主要使用英文表述,对那些对英文不熟的网友,本论坛将另开办《科技英语一周一贴》,欢迎踊跃参加,并提出问题。
The development of efficient assembly language code shows how efficient a DSP processor can be: each assembler instruction is performing several useful operations. But it also shows how difficult it can be to program such a specialised processor efficiently.
temp = *c_ptr++) * *x_ptr--);
a1 = *r3++ * *r4--
for (k = 1; k < N-1; k++)
do 0,r1
temp = temp + *c_ptr++ * *x_ptr--)
a1 = a1 + *r3++ * *r4--
*y_ptr++ = temp
*r2++ = a1
Bear in mind that we use DSP processors to do specialised jobs fast. If cost is no object, then it may be permissible to throw away processor power by inefficient coding: but in that case we would perhaps be better advised to choose an easier processor to program in the first place. A sensible reason to use a DSP processor is to perform DSP either at lowest cost, or at highest speed. In either case, wasting processor power leads to a need for more hardware which makes a more expensive system which leads to a more expensive final product which, in a sane world, would lead to loss of sales to a competitive product that was better designed.
One example (see figure) shows how essential it is to make sure a DSP processor is programmed efficiently:
The diagram shows a single assembler instruction from the Lucent DSP32C processor. This instruction does a lot of things at once:
•two arithmetic operations (an add and a multiply)
•three memory accesses (two reads and a write)
•one floating point register update
•three address pointer increments
All of these operations can be done in one instruction. This is how the processor can be made fast. But if we don't use any of these operations, we are throwing away the potential of the processor and may be slowing it down drastically. Consider how this instruction can be translated into MIPS or Mflops.
The processor runs with an 80 MHz clock. But, to achieve four memory accesses per instruction it uses a modified von Neuman memory architecture which requires it to divide the system clock by four, resulting in an instruction rate of 20 MIPS. If we go into manic marketing mode, we can have fun working out ever higher MIPS or MOPS ratings as follows:
80 MHz clock
20 MIPS = 20 MOPS
but 2 floating point operators per cycle = 40 MOPS
and four memory accesses per instruction = 80 MOPS
plus three pointer increments per instruction = 60 MOPS
plus one floating point register update = 20 MOPS
making a grand total MOPS rating of 200 MOPS
Which exercise serves to illustrate three things:
•MIPS, MOPS and Mflops are misleading measures of DSP power
•marketing men can squeeze astonishing figures out of nothing
Of course, we omitted to include in the MOPS rating (as some manufacturers do) the possibility of DMA on serial port and parallel port, and all those associated increments of DMA address pointers, and if we had multiple comm ports, each with DMA, we could go really wild...
Apart from a cheap laugh at the expense of marketing, there is a very serious lesson to be drawn from this exercise. Suppose we only did adds with this processor? Then the Mflops rating falls from a respectable 40 Mflops to a pitiful 20 Mflops. And if we don't use the memory accesses, or the pointer increments, then we can cut the MOPS rating from 200 MOPS to 20 MOPS.
It is very easy indeed to write very inefficient DSP code. Luckily it is also quite easy, with a little care, to write very efficient DSP code.
|
|