I've got one very specific problem with a project that has been haunting me for days now. I have the following Verilog code for a RAM module:
module RAM_param(clk, addr, read_write, clear, data_in, data_out);
parameter n = 4;
parameter w = 8;
input clk, read_write, clear;
input [n-1:0] addr;
input [w-1:0] data_in;
output reg [w-1:0] data_out;
reg [w-1:0] reg_array [2**n-1:0];
integer i;
initial begin
for( i = 0; i < 2**n; i = i + 1 ) begin
reg_array[i] <= 0;
end
end
always @(negedge(clk)) begin
if( read_write == 1 )
reg_array[addr] <= data_in;
if( clear == 1 ) begin
for( i = 0; i < 2**n; i = i + 1 ) begin
reg_array[i] <= 0;
end
end
data_out = reg_array[addr];
end
endmodule
It behaves exactly as expected, however when I go to synthesize I get the following:
Synthesizing Unit <RAM_param_1>.
Related source file is "C:\Users\stevendesu\---\RAM_param.v".
n = 11
w = 16
Found 32768-bit register for signal <n2059[32767:0]>.
Found 16-bit 2048-to-1 multiplexer for signal <data_out> created at line 19.
Summary:
inferred 32768 D-type flip-flop(s).
inferred 2049 Multiplexer(s).
Unit <RAM_param_1> synthesized.
32768 flip-flops! Why doesn't it just infer a block RAM? This RAM module is so huge (and I have two of them - one for instruction memory, one for data memory) that it consumes the entire available area of the FPGA... times 2.4
I've been trying everything to force it to infer a block RAM instead of 33k flip flops, but unless I can get it figured out soon I may have to greatly reduce the size of my memory just to fit on a chip.
If the read address is not registered or the ram_style attribute is set to distributed, distributed RAM will be inferred. Registering the write data & address or the read output have no effect on block RAM vs. distributed RAM. (Note: the first "s" in synthesis must be lower case.)
Block RAM (BRAM) is a type of random access memory embedded throughout an FPGA for data storage. You can use BRAM to accomplish the following tasks: Transfer data between multiple clock domains by using local FIFOs. Transfer data between an FPGA target and a host processor by using a DMA FIFO.
Block Ram is a dedicated Ram that does not consume any additional LUT in your design whereas distributed Ram is built up with LUT. In terms of speed the distributed RAM is faster than Block Rams. Generally speaking, if not much Ram is needed you can consider to implement it as a distributed Ram.
I just remove something your code, the result like this:
module RAM_param(clk, addr, read_write, clear, data_in, data_out);
parameter n = 4;
parameter w = 8;
input clk, read_write, clear;
input [n-1:0] addr;
input [w-1:0] data_in;
output reg [w-1:0] data_out;
// Start module here!
reg [w-1:0] reg_array [2**n-1:0];
integer i;
initial begin
for( i = 0; i < 2**n; i = i + 1 ) begin
reg_array[i] <= 0;
end
end
always @(negedge(clk)) begin
if( read_write == 1 )
reg_array[addr] <= data_in;
//if( clear == 1 ) begin
//for( i = 0; i < 2**n; i = i + 1 ) begin
//reg_array[i] <= 0;
//end
//end
data_out = reg_array[addr];
end
endmodule
Init all zeros may dont't need code, if you want to init, just do it:
initial
begin
$readmemb("data.dat", mem);
end
Then the result that I got from ISE 13.1
Synthesizing (advanced) Unit <RAM_param>.
INFO:Xst:3231 - The small RAM <Mram_reg_array> will be implemented on LUTs in order to maximize performance and save block RAM resources. If you want to force its implementation on block, use option/constraint ram_style.
-----------------------------------------------------------------------
| ram_type | Distributed | |
-----------------------------------------------------------------------
| Port A |
| aspect ratio | 16-word x 8-bit | |
| clkA | connected to signal <clk> | fall |
| weA | connected to signal <read_write> | high |
| addrA | connected to signal <addr> | |
| diA | connected to signal <data_in> | |
| doA | connected to internal node |
Update here!: Strong thanks to mcleod_ideafix Sorry about forgot your question: it's block RAM, not distributed. For block RAM, you must force it: Synthesis - XST -> Process Properties -> HDL option -> RAM style -> Change from auto to Block. The result will be this:
Synthesizing (advanced) Unit <RAM_param>.
INFO:Xst:3226 - The RAM <Mram_reg_array> will be implemented as a BLOCK RAM, absorbing the following register(s): <data_out>
-----------------------------------------------------------------------
| ram_type | Block | |
-----------------------------------------------------------------------
| Port A |
| aspect ratio | 16-word x 8-bit | |
| mode | read-first | |
| clkA | connected to signal <clk> | fall |
| weA | connected to signal <read_write> | high |
| addrA | connected to signal <addr> | |
| diA | connected to signal <data_in> | |
| doA | connected to signal <data_out> | |
-----------------------------------------------------------------------
| optimization | speed | |
-----------------------------------------------------------------------
Unit <RAM_param> synthesized (advanced).
End of Update
I recommend you read xst user guide for RAM sample code and the device data sheet. For example, in some FPGA LUT RAM: the reset signal is not valid. If you tried to reset it, the more logic module to reset must be integrate it. It leads to D-FF instead of RAM. The Reset signal will auto-assign to system reset.
In case of Block RAM (not LUT RAM), I prefer to specific depth/data-width or core generation or call it directly from library. More source code for general usage (ASIC/FPGA) can be found here: http://asic-world.com/examples/verilog/ram_dp_sr_sw.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With