Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NULL storage in Oracle

Tags:

sql

oracle

I have a table in Oracle 11g Standard One Edition:

table1

col1 col2 col3 col4 col5 col6 col7 col8       col9 col10 col11
1    NULL 2    3    4    5    NULL NULL       19   21    22
1    NULL 2    3    4    5    NULL 1 Jan 2009 19   21    22
1    NULL 2    3    4    5    NULL NULL       19   21    22
1    9    2    3    4    5    A    NULL       19   21    22
1    NULL 2    3    4    5    B    NULL       19   21    22

The table desc is:

Name                 Null Type          
-------------------- ---- ------------- 
COL1                      NUMBER        
COL2                      NUMBER        
COL3                      NUMBER        
COL4                      NUMBER       
COL5                      NUMBER        
COL6                      NUMBER        
COL7                      VARCHAR2(255) 
COL8                      DATE          
COL9                      DATE  
COL10                     DATE        
COL11                     VARCHAR2(255) 

I need to find out what is the percentage of storage a table consumes with values NULL?

Example: table1 storage consumed is 1 GB, and NULLs inside it consumes 100MB, so, NULL takes up 10% of the storage.

Also, are there alternate representations of NULL in ORACLE?

like image 417
dang Avatar asked May 10 '16 16:05

dang


People also ask

What is null storage?

The Null Storage Unit is a storage unit on a media server. The client backed up by the test policy may or may not be on the same system.

What is null in Oracle database?

If a column in a row has no value, then the column is said to be null, or to contain null. Nulls can appear in columns of any datatype that are not restricted by NOT NULL or PRIMARY KEY integrity constraints. Use a null when the actual value is not known or when a value would not be meaningful.

Does NULL value occupy space in Oracle?

Nulls are stored in the database if they fall between columns with data values. In these cases, they require 1 byte to store the length of the column (zero). Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null.

How are null values stored?

It stores an array of bits (one per column) with the data for each row to indicate which columns are null and then leaves the data for that field blank.


2 Answers

The NULLs in your table may consume as little as 1.75% of the storage space.

But that number is meaningless, even though it's based on the reproducible test case below. It's more important to understand that NULLs are tiny (just one byte). So tiny that the "real" size should be irrelevant except in extreme cases. So tiny that it is almost always a waste of time to worry about alternate representations.


The best case test case (space usage in practice)

Let's create 1GB of data using your table definition. First, let's create the table.

create table test1(
COL1  NUMBER,
COL2  NUMBER,
COL3  NUMBER,
COL4  NUMBER,
COL5  NUMBER,
COL6  NUMBER,
COL7  VARCHAR2(255),
COL8  DATE,
COL9  DATE,
COL10 DATE,
COL11 VARCHAR2(255)
) pctfree 0 /* Let's assume no updates or deletes, and pack the data tightly */;

Now create one gigabyte of data. Each value uses the largest-possible value for that data type.

begin
    for i in 1 .. 15 loop  --Magic number to generate exactly 1GB.
        insert into test1
        select
            .0123456789012345678901234567890123456789,
            .0123456789012345678901234567890123456789,
            .0123456789012345678901234567890123456789,
            .0123456789012345678901234567890123456789,
            .0123456789012345678901234567890123456789,
            .0123456789012345678901234567890123456789,
            lpad('A', 255, 'A'),
            sysdate,
            sysdate,
            sysdate,
            lpad('A', 255, 'A')
        from dual
        connect by level <= 95000;    --Magic number to generate exactly 1GB.
        commit;
    end loop;
end;
/

These queries show that it uses 1GB of space for 1,425,000 rows.

select count(*) from test1;
select bytes/1024/1024/1024 gb from user_segments where segment_name = 'TEST1';

Now create a second table, with the same number of rows, but a NULL in every column.

create table test1_null as
select col1+null c1, col2+null c2, col3+null c3, col4+null c4, col5+null c5, col6+null c6,
    cast(null as varchar2(255)) c7, col8+null c8, col9+null c9, col10+null c10,
    cast(null as varchar2(255)) c11
from test1;

The new segment size is only 0.0175GB, or 1.75%.

select bytes/1024/1024/1024 gb from user_segments where segment_name = 'TEST1_NULL';

Why that test case is misleading

While this may sound like a simple question, to completely answer it would require either an entire book or a crystal ball. Getting real storage sizes is ridiculously complicated. You'll need to think about at least these issues:

  1. Variable width data. Most Oracle data types will only use the amount of space required to store the data. So the percent of storage used for that NULL byte depends on precisely what is in the other columns. Only a few data types use a static amount of storage regardless of data, such as CHAR, NCHAR, DATE, TIMESTAMP, etc.
  2. Trailing nulls. All consecutive NULLs at the end of the row are stored in one byte. Unless basic compression is enabled, then every NULL uses a byte again.
  3. Row overhead. Every row has overhead that depend on the columns and configuration. The skinnier the table, the more the row overhead uses up the space, so the percent used by a NULL will fluctuate.
  4. Block overhead. This depends on the number of rows, settings like PCTFREE, if previous rows were deleted, when the table was last re-organized, block size, etc.
  5. Segment overhead. Space is allocated as chunks of extents. Extent management can use a default algorithm (which I think allocates in chunks of 1MB up to 64MB) or it can be any custom value. This overhead becomes less relevant depending on the amount of data. It's possible a tablespace is set to a huge uniform extent size, such as 10GB, which will probably waste a lot of space regardless of the column values.
  6. Other I/O overhead. Space is probably also wasted by ASM, the operating system, the SAN, etc.

Format of a Row Piece (space usage in theory)

The image below is from the Logical Storage Structures chapter of the Concepts Guide:

enter image description here

The Column Data consists of a series of Column Lengths and Column Values. If the value is NULL, the Column Length is set to 0 and the Column Value does not use any space. This is why a NULL always uses just 1 byte, for the number 0.

Most data types are variable so the length will use at least 1 byte and the value will use at least 1 byte if it's non-NULL. Static data types, like DATE, will still use 1 byte for the length and then 7 bytes for the value. Again, unless the date is NULL, then the length is set to 0 and the value is empty.

This image may also explain the "trailing NULLs" storage trick. When there are trailing nulls, Oracle probably sets the Number of Columns lower, leaves the last Column Length as 0, and infers that the remaining columns are also NULL.

Alternative Representations?

Now I'm getting suspicious. Asking about alternative representations of NULLs brings to mind four kinds of people:

  1. Hopelessly theoretical people who complain about violating the relational model and propose using obscure tools instead of the ones that have been working fine for decades.
  2. Data architects who think a ginormous Entity-Attribute-Value table is always the answer. "Hey, it looks good on my PDF, who cares if it's impossible to query?"
  3. Those who are a bit new to SQL and rightfully frustrated with the way NULLs work.
  4. Stackoverflow users who read too much into questions. (So feel free to add information on the background behind this question if I'm way off!)

Yeah, NULLs are a bit weird. But it will make sense soon. Don't worry too much about the space, or ways to completely avoid NULLs. The price you're paying for NULLs is nothing compared to the price you'd pay for anti-patterns that completely avoid them.

like image 147
Jon Heller Avatar answered Nov 10 '22 13:11

Jon Heller


First depends on the table properties(is it partitioned, indexes, datatype, lob fields etc), file systems and some other factors. In the past I had a similar task for oracle 11. Here are the steps that I took(it wasn't needed to be extremely precise because of the size - Data base had more than 3000 tables):

My algorithm

  1. Create a copy of your table without nulls (1000 records);
  2. Create a copy only with nulls(1000 records);
  3. Count your nulls per column with (this can be automated in order to check which columns are with higher amount of nulls)

    SELECT COUNT(*) FROM YourTable WHERE YourColumn IS NULL

  4. Create a copy only based on last measure (1000 records);

Analyse the results.

hope that his will help you.

Note: At least in my case the goal was to analyse database usage and clean up.

Some furhter readings on this topic:

Do NULL values increase storage space?

How to calculate row size in a table?

like image 27
Vanko Avatar answered Nov 10 '22 14:11

Vanko