What Is Pool Memory?

Windows provides pools of paged and nonpaged memory that drivers and other components can allocate. The Executive component of the operating system manages the memory in the pools and exposes the ExAllocatePoolXxx functions for use by drivers. Pool memory is a subset of available memory and is not necessarily contiguous. The size of each pool is limited and depends on the amount of physical memory that is available and varies greatly for different Windows releases.

The paged pool is exactly what its name implies: a region of virtual memory that is subject to paging. The size of the paged pool is limited and depends on both the amount of available physical memory on each individual machine and the specific operating system release. For example, the maximum size of the paged pool is about 491 MB on 32-bit hardware running Windows XP and about 650 MB on Windows Server 2003 SP1.

The nonpaged pool is a region of system virtual memory that is not subject to paging. Drivers use the nonpaged pool for many of their storage requirements because it can be accessed at any IRQL. Like the paged pool, the nonpaged pool is limited in size. On a 32-bit x86 system that is started without the /3GB switch, the nonpaged pool is limited to 256 MB; with the /3GB switch, the limit is 128 MB. On 64-bit systems, the nonpaged pool currently has a limit of 128 GB.

The pool sizes and maximums may vary greatly for different Windows releases.

IRQL Considerations

When you design your driver, keep in mind that the system cannot service a page fault at IRQL DISPATCH_LEVEL or higher. Therefore, drivers must use nonpaged pool for any data that can be accessed at DISPATCH_LEVEL or higher. You cannot move a buffer that was allocated from the paged pool into the nonpaged pool, but you can lock a paged buffer into memory so that it is temporarily nonpaged.

Locks must never be allocated in the paged pool because the system accesses them at DISPATCH_LEVEL or higher, even if the locks are used to synchronize code that runs below DISPATCH_LEVEL.

Storage for the following items can generally be allocated from the paged pool, depending on how the driver uses them:

· Information about device resources, relations, capabilities, interfaces, and other details that are handled in IRP_MN_PNP_QUERY_* requests. The Plug and Play manager sends all queries at PASSIVE_LEVEL, so unless the driver must reference this information at a higher IRQL, it can safely store the data in paged memory.

· The registry path passed to DriverEntry. Some drivers save this path for use during WMI initialization, which occurs at PASSIVE_LEVEL.

While running at DISPATCH_LEVEL or below, a driver can allocate memory from the nonpaged pool. A driver can allocate paged pool only while it is running at PASSIVE_LEVEL or APC_LEVEL because APC_LEVEL synchronization is used within the pool manager code for pageable requests. Furthermore, if the paged pool is nonresident, accessing it at DISPATCH_LEVEL—even to allocate it—would cause a fatal bug check.

For a complete list of standard driver routines and the IRQL at which each is called, see “Scheduling, Thread Context, and IRQL,” which is listed in the Resources section at the end of this paper. In addition, the Windows DDK lists the IRQL at which system and driver routines can be called.

Lookaside Lists

Lookaside lists are fixed-size, reusable buffers that are designed for structures that a driver might need to allocate dynamically and frequently. The driver defines the size, layout, and contents of the entries in the list to suit its requirements, and the system maintains list status and adjusts the number of available entries according to demand.

When a driver initializes a lookaside list, Windows creates the list and holds the buffers in reserve for future use by the driver. The number of buffers that are in the list at any given time depends on the amount of available memory and the size of the buffers. Lookaside lists are useful whenever a driver needs fixed-size buffers and are especially appropriate for commonly used and reused structures, such as I/O request packets (IRPs). The I/O manager allocates its own IRPs from a lookaside list.

A lookaside list can be allocated from either the paged or the nonpaged pool, according to the driver’s requirements. After the list has been initialized, all buffers from the list come from the same pool.

Caching

Drivers can allocate cached or noncached memory. Caching improves performance, especially for access to frequently used data. As a general rule, drivers should allocate cached memory. The x86, x64, and Itanium architectures all support cache-coherent DMA, so drivers can safely use cached memory for DMA buffers.

Drivers rarely require noncached memory. A driver should allocate no more noncached memory than it needs and should free the memory as soon as it is no longer required.

Alignment

The alignment of the data structures in a driver can have a big impact on the driver’s performance and efficiency. Two types of alignment are important:

· Natural alignment for the data size

· Cache-line alignment

Natural Alignment

Natural alignment means aligning data according to its type. The Microsoft C compiler aligns individual data items on an appropriate boundary for their size. For example, UCHARs are aligned on 1-byte boundaries, ints on 4-byte boundaries, and LONGs and ULONGs on 4-byte boundaries.

Individual data items within a structure are also naturally aligned—the compiler adds padding bytes if required. When you compile, structures are aligned according to the alignment requirements of the largest member. Unions are aligned according to the requirements of the first member of the union. To align individual members of a structure or union, the compiler also adds padding bytes. When you compile a 32-bit driver, pointers are 32 bits wide and occupy 4 bytes. When you compile a 64-bit driver, pointers are 64 bits and occupy 8 bytes. A structure that contains a pointer, therefore, might require different amounts of padding on 32-bit and 64-bit systems. If the structure is used only internally within the driver, differences in padding are not important. However, you must ensure that the padding is the same on 32-bit and 64-bit systems in either of the following situations:

· The structure is used by both 32-bit and 64-bit processes running on a 64-bit machine.

· The structure might be passed to or used on 32-bit hardware as a result of being saved on disk, sent over the network, or used in a device I/O control request (IOCTL).

You can resolve this issue by using pragmas (as described below) or by adding explicit dummy variables to the structure just for padding. For cross-platform compatibility, you should explicitly align data on 8-byte boundaries on both 64-bit and 32-bit systems.

Proper alignment enables the processor to access data in the minimum number of operations. For example, a 4-byte value that is naturally aligned can be read or written in one cycle. Reading a 4-byte value that does not start on a 4-byte (or multiple) boundary requires an additional cycle, and the requested bytes must be pieced together into a single 4-byte unit before the value can be returned.

If the processor tries to read or write improperly aligned data, an alignment fault can occur. On x86 hardware, the alignment faults are invisible to the user. The hardware fixes the fault as described in the previous paragraph. On x64 hardware, alignment faults are disabled by default and the hardware similarly fixes the fault. On the Intel Itanium architecture, however, if an alignment fault occurs while 64-bit kernel-mode code is running, the hardware raises an exception. (For user-mode code, this is a default setting that an individual application can change, although disabling alignment faults on the Itanium can severely degrade performance.)

To prevent exceptions and performance problems that are related to misalignment, you should lay out your data structures carefully. When allocating memory, ensure that you allocate enough space to hold not just the natural data size, but also the padding that the compiler adds. For example, the following structure includes a 32-bit value and an array. The array elements can be either 32 or 64 bits long, depending on the hardware.

struct Xx {

DWORD NumberOfPointers;

PVOID Pointers[1];

};

When this declaration is compiled for 64-bit hardware, the compiler adds an extra 4 bytes of padding to align the structure on an 8-byte boundary. Therefore, the driver must allocate enough memory for the padded structure. For example, if the array could have a maximum of 100 elements, the driver should calculate the memory requirements as follows:

FIELD_OFFSET (struct Xx, Pointers) + 100*sizeof(PVOID)

The FIELD_OFFSET macro returns the byte offset of the Pointers array in the structure Xx. Using this value in the calculation accounts for any bytes of padding that the compiler might add after the NumberOfPointers field.

To force alignment on a particular byte boundary, a driver can use any of the following:

· The storage class qualifier __declspec(align()) or the DECLSPEC_ALIGN() macro

· The pack() pragma

· The PshpackN.h and Poppack.h header files

To change the alignment of a single variable or structure, you can use __declspec(align()) or the DECLSPEC_ALIGN() macro, which is defined in the Windows DDK. The following type definition sets alignment for the ULONG_A16 type at 16 bytes, thus aligning the two fields in the structure and the structure itself on 16-byte boundaries:

typedef DECLSPEC_ALIGN(16) ULONG ULONG_A16;

typedef struct {

ULONG_A16 a;

ULONG_A16 b;

} TEST;

You can also use the pack() pragma to specify the alignment of structures. This pragma applies to all declarations that follow it in the current file and overrides any compiler switches that control alignment. By default, the DDK build environment uses pack (8). The default setting means that any data item with natural alignment up to and including 8 bytes is naturally aligned, not necessarily 8-byte aligned, and everything larger than 8 bytes is aligned on an 8-byte boundary. Thus, two adjacent ULONG fields in a 64-bit aligned structure are adjacent, with no padding between them.

Another way to change the alignment of data structures in your code is to use the header files PshpackN.h (pshpack1.h, pshpack2.h, pshpack4.h, pshpack8.h, and pshpack16.h) and Poppack.h, which are installed as part of the Windows DDK. The PshpackN.h files change alignment to a new setting, and Poppack.h returns alignment to its setting before the change was applied. For example:

#include <pshpack2.h>

typedef struct _STRUCT_THAT_NEEDS_TWO_BYTE_PACKING {

/* contents of structure

...

} STRUCT_THAT_NEEDS_TWO_BYTE_PACKING;

#include <poppack.h>

In the example, the pshpack2.h file sets 2-byte alignment for everything that follows it in the source code, until the poppack.h file is included. You should always use these header files in pairs. Like the pack() pragma, they override any alignment settings specified by compiler switches.

For more information about alignment and the Microsoft compilers, see the Windows DDK and the MSDN library, which are listed in the Resources section of this paper.

Cache-Line Alignment

When you design your data structures, you can further increase the efficiency of your driver by considering cache-line alignment in addition to natural alignment.

Memory that is cache-aligned starts at a processor cache-line boundary. When the hardware updates the processor cache, it always reads an entire cache line rather than individual data items. Therefore, using cache-aligned memory can reduce the number of cache updates necessary when the driver reads or writes the data and can prevent other components from contending for updates of the same cache line. Any memory that starts on a page boundary is cache-aligned.

Drivers typically allocate nonpaged, cache-aligned memory to hold frequently accessed driver data. If possible, lay out data structures so that individual fields are unlikely to cross cache line boundaries. The size of a cache line is generally from 16 to 128 bytes, depending on the hardware. The KeGetRecommendedSharedDataAlignment function returns the recommended alignment on the current hardware.

Cache-line alignment is also important for shared data that two or more threads can access concurrently. To reduce the number of cache updates, fields that are protected by the same lock and are updated together should be in the same cache line. Structures that are protected by different locks and can therefore be accessed simultaneously on two different processors should be in different cache lines. Laying out data structures in this way prevents processors from contending for the same cache line, which can have a profound effect on performance.

For more information about cache line alignment on multiprocessor systems, see “Multiprocessor Considerations for Kernel-Mode Drivers,” which is listed in the Resources section at the end of this paper.

What Is Pool Memory?

IRQL Considerations

Lookaside Lists

Caching

Alignment

Natural Alignment

Cache-Line Alignment

0 komentar:

Posting Komentar

Popular posts

Archive

Labels

Histats

Mengenai Saya

Pengikut