Sunday 22 July 2012
SQL Server Data Storage – Extent and Data Pages
For storing information in database we use Table. For that we create table by writing Create Table statement. But before we do that, let’s take a look at how SQL Server stores data.
In SQL SERVER data is stored in the form of Data Pages. Basic unit of storage to provide space for tables is Extent. Every Extent has number of Data Pages. When new records are inserted new data pages are allocated. Each Extent consists of eight data pages. New Extent will be allocated after all the eight pages are consumed. While Extent is the basic storage unit from database point of view, page is a unit of allocation within Extent.
Everything is stored on 8K Data Pages (technically 8060 bytes are available). 8 Data Pages (64K) combines into an extent.
This means 1 EXTENT = collection of 8 Data Pages.
The Extent is the basic unit of allocation for tables and indexes and all objects are saved in a table of some kind, all objects are stored in extents.
Fig. 1 – Extent and Data Page
SQL Server has two types of extents:
» Uniform: In Uniform Extents, all eight pages are used by the same object or stores
data pages belong to the one object.
» Mixed: Mixed Extents are used by objects that are too small to take up eight
pages, so more than one object is stored in the extent or stores data
pages belong to the different objects.First 8 pages for the objects are stored in the mixed extents, after that only uniform extents are used. All space allocation is done based on extents - 64K blocks regardless of the type (mixed or uniform).
» Uniform: In Uniform Extents, all eight pages are used by the same object or stores
data pages belong to the one object.
» Mixed: Mixed Extents are used by objects that are too small to take up eight
pages, so more than one object is stored in the extent or stores data
pages belong to the different objects.First 8 pages for the objects are stored in the mixed extents, after that only uniform extents are used. All space allocation is done based on extents - 64K blocks regardless of the type (mixed or uniform).
Fig.-2: Extent
A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.
When a table or an index needs additional storage space, another extent is allocated to that object. A new extent will generally not be allocated for a table or index until all pages on that extent have been used. This process of allocating extents rather than individual pages to objects serves two useful purposes. First, the time-consuming process of allocation takes place in one batch rather than forcing each allocation to occur whenever a new page is needed. Second, it forces the pages allocated to an object to be at least somewhat contiguous. If pages were allocated directly, on an as needed basis, then pages belonging to a single object would not be next to each other in the data file. Page 1 might belong to table 1, page 2 might belong to index 3, page 3 might belong to table 5, and so on. This is called fragmentation. Fragmentation can have a significant negative impact on performance. When pages for a single object are contiguous, though, reads and writes can occur much more quickly.
Pages are contained in Extent. Every extent will have around eight data pages. But all the eight data pages are not created at once; it’s created depending on data demand. So when a pages becomes full it creates a new page, this process is called “Page Split”..
At the most fundamental level, everything in SQL Server is stored on an 8KB page. The page is the one common denominator for all objects in SQL Server. Many types of pages exist, but every page has some factors in common. Pages are always 8KB in size and always have a header, leaving about 8,060 bytes of usable space on every page
There are a few special data pages types SQL Server is using to track extents allocation. Those pages are basically bitmaps – every bit handles one extent. So one page can cover 64,000 extents or almost 4 GB of data. Let’s take a quick look at them:
SQL Server has eight primary types of pages:
v Data pages:
Data pages hold the actual database records. The data page is 8,192 bytes, but only 8,060 of those bytes are available for data storage because a header at the beginning of each data page contains information about the page itself. Rows are not allowed to span more than one page, but if you have variable-length columns that exceed this limit, you can move them to a page in the ROW_OVERFLOW_DATA allocation unit.
v Index pages:
Index pages store the index keys and levels making up the entire index tree. Unlike data pages, you have no limit for the total number of entries you can make on an index page.
v Text/Image pages:
Text and image pages hold the actual data associated with text, ntext, and image data types. When a text field is saved, the record will contain a 16-byte pointer to a linked list of text pages that hold the actual text data. Only the 16-byte pointer inside the record is counted against the 8,060-byte record-size limit.
v Global Allocation Map pages:
The Global Allocation Map (GAM) page type keeps track of which extents in a data file are allocated and which are still available.
v Index Allocation Map pages:
Index Allocation Map (IAM) pages keep track of what an extent is being used for—specifically, to which table or index the extent has been allocated.
v Page Free Space pages:
This is not an empty page; rather, it is a special type of page that keeps track of free space on all the other pages in the database. Each Page Free Space page can keep track of the amount of free space of up to 8,088 pages or about 64Mb of data.
v Bulk Changed Map pages: This page contains information about other pages that have been modified by bulk operations (such as BULK INSERT) since the last BACKUP LOG statement.
v Differential Changed Map pages:
This page contains information about other pages that have changes since the
last BACKUP DATABASE statement
This page contains information about other pages that have changes since the
last BACKUP DATABASE statement
v Shared Global Allocation Map (SGAM):
It tracks if extents are mixed extent and have at least one data page available for use.
Page has three important sections:-
Ø Page Header
Ø Actual Data i.e. Data Row
Ø Row pointers or Row Offset
Fig.-3 - Data Page
Let’s dive into one level deeper and take a look at the structure of the data page.
Fig.- 4- Data Page In Details
First 2 bytes contain header information. Next 2 bytes store the length of the fixed width data following by the data. Next are 2 bytes for the number of columns. Next, null bitmask (1 byte per 8 nullable columns). It follows by 2 bytes store number of variable width columns, variable width column offset array (2 bytes per variable column) and variable width data. And finally there is the optional 14 bytes pointer to the version store. This one is used for optimistic isolation levels (snapshot, read committed snapshot), MARS, etc.
First of all, fixed width data always uses space even when null. Variable width data uses 2 extra bytes for offset storage for every value. Null values of variable width data are not stored although there are still 2 bytes in the offset array unless null values are last in the row.
So where does it lead us? Let’s think about the table which accepts some transactional data.Create table dbo.AppData
(
LastTranDate Datetime Not Null,
Amount float not null,
IsApproved int not null,
IsPending int not null,
LMDT Datetime not null constraint DEF_ AppData_LMDT
default (getDate())
)
5 fields we have in this table require 8 + 8 + 4 + 4 + 8 = 32 bytes. Now let’s think for a minute. Do we really need to store
Let’s modify the table a little bit:LastTranDate
date/time with precision up to 3 milliseconds? Would 1 minute be OK? Same about LMDT column. Can we use 1 second precision? What about Amount? Can we use smallmoney or maybe decimal(9,3)?Create table dbo.AppData
(
LastTranDate
smalldatetime
Not Null,
Amount
decimal(9,3)
not null,
IsApproved BIT not null,
IsPending BIT not null,
LMDT Datetime2(0) not null constraint DEF_ AppData_LMDT
default (getDate())
)
Now it requires: 4 + 5 + 1 + 0 (8 bit fields shares 1 byte of storage space) + 6 = 16 bytes. We ended up with 16 bytes of saving. Not much. On other hand, this is about 16K per 1000 rows. Or about 16Mb per 1M rows. And what if system collects 1M rows per day? It would be ~5.8Gb per year. What if you have 50M rows per day..?
Finally it’s not only about the storage size. It greatly affects performance of the system because of the extra IO operations and other things also.
So always we have use correct data types.
No comments:
Post a Comment