|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?注册
x
本帖最后由 cjsb37 于 2013-4-29 09:16 编辑
File Name: dct_122700.zip
File Contents: read_me.txt
mds_def.h
r8x8dct.asm
tr8x8dct.c
tr8x8dct.h
Module Name: The implementation of forward DCT for 8x8 real data.
Label Name: __r8x8dct
Description: This is the implementation of Chen's algorithm of DCT.
It is based on the separable nature of DCT for multi-
dimension. The input matrix is 8x8 real data. First, one dime-
sional 8-point DCT is calculated for each of the 8 rows. The
output is stored in a separate matrix after transpose. Then again
8-point DCT is calculated on each row of matrix. The output
is again stored in a transpose matrix. This is final output.
Chen's algorithm has 4 stages of implementation. In the first
stage there are additions and subtractions only. In the second
stage addition and subtraction is done with one multiplication.
In the third and last (fourth) stages more MAC operations are
involved.
This implementation works only for 8x8 input. The input data
should be real. The range of input should be -256 to 255.
The algorithm is in-placed.
The prototype of the C callable is as follows:
_r8x8dct(fract16 *in, fract16 *coeff, fract16 *temp);
*in -> Pointer to Input vector.
*coeff -> Pointer to coefficients.
*temp -> Pointer to temproary data.
Note: The algorithm reads the input data from the "in" matrix.
First 8-point DCT will be calculated for all the 8 rows.
This output is stored in "temp" buffer in the transposed
form at bit reversed locations.
Again the 8-point DCT is applied on all the 8 rows of
"temp" buffer. Final output computed is stored in "in"
buffer in transposed form at bit reversed locations.
The operation of transposing the matrix and calculation of
bit reversed are carried out while writing the data without
any explicit code.
Output of function is provided "in" buffer in normal order.
Registers Used: R0, R1, R2, R3, R4, R5, R6, R7, P0, P1, P2, P3, P4, P5, A0, A1.
Other Register Used: I0, I1, I2, I3, B0, B2, B3, M0, M1, L3 registers and LC0.
Performance: (Timer version 0.6.33)
Code Size : 240 Bytes.
Memory Required :
Input Matrix : 8 * 8 * 2 Bytes.
Coefficients : 16 Bytes
Temporary matrix : 8 * 8 * 2 Bytes.
Cycle Count :
-----------------------------------------
| Size | Forward DCT | Inverse DCT |
-----------------------------------------
| 8x8 | 284 Cycles | 311 Cycles |
-----------------------------------------
|
|