Query Processor Internals and Architecture:
We always execute statements, execute queries but never
think of what happens when we click on Execute /f5 isn’t? So here I would like
to share my knowledge what I know about it. Please comment or add if I miss
anything.
Before dig deep into it, let’s understand the internal
architecture of server studio. We’ll look primarily at how things are processed
on the client, how the various clients interact with SQL Server, and what SQL
Server does to process clients' requests.
SQL Server Is a Client/Server System:
SQL Server is a client/server system which means that a
piece of SQL Server, the client API portion, sits somewhere remotely in the
process structure, separate from the server component itself.
First, a client sends in a login message (or event) on a connection
and gets back a success or failure response. When you want to send in a SQL
statement, there is a SQL language message that a client can package up and
send to SQL Server.
The server runs a separate multithreaded process, and it
services requests coming from clients, no matter where those clients are
located. The real interface to SQL Server is the Tabular Data Stream (TDS)
protocol that is spoken between the client and the server. TDS is a simple
protocol. The TDS packets are encapsulated in the packets built for the
protocol stack used by the Net-Libraries. For example, if you are using the
TCP/IP Sockets Net-Library, then the TDS packets are encapsulated in the TCP/IP
packets of the underlying protocol.
The contents of the packets that send result sets back to
the application depends on whether FOR XML is specified in the Transact-SQL
statement transmitted to the database engine:
• If FOR
XML is not specified, the database engine sends a relational result set back to
the application. The TDS packets contain the rows of the result set, with each
row comprised of one or more columns, as specified in the select list of the
SELECT statement.
• If FOR
XML is specified, the database engine streams an XML document back to the
application. The XML document is formatting in the TDS packets as if it were a
single, long Unicode value, with each packet being approximately 4 KB in size.
Client Architecture:
Figure shows the simplified architecture of client. Our
Application runs in client side which can use many APIs to interact with
Databases. Here we can see some APIs like OLE-DB, ODBC, DB-Library, and
Embedded SQL. These APIs interact with Net Library which selects suitable
network protocol to transmit request to server side.
OLE-DB:
OLE DB (Object Linking and Embedding) is an API (Application
Programming Interface) which is designed by Microsoft. Let us assume a
condition when our application (web or desktop) have to interact with database
in that case we use this API for Microsoft visual studio.
There are
two main concepts in OLE-DB consumer and provider. Consumer (the data access
application) accesses data from a data store with the help of a provider (the
intermediary). I.e. when request comes from consumer which can be for
retrieving data or updating the data store then provider helps to retrieve data
or write back to the data store.
ODBC:
Similar to OLE-DB, this is also an API which access
Databases. This is developed by SQL Access group in 1992.The best thing about
ODBC is that application developers don't need to modify their applications to
allow them to access data from several back-ends. As long as there is an ODBC
Driver for a particular back-end, an ODBC-enabled front-end can access it. And
second, one ODBC Driver for a particular DBMS allows any ODBC-enabled
application to be an ODBC client.
Here
there are two components ODBC CLIENT, ODBC DRIVER. The ODBC client uses a
language or vocabulary of commands (which is referred to as "ODBC")
to request data from, or to send data to, the back- end or server DBMS.
However, the DBMS doesn't understand the ODBC client request until the command
passes through the ODBC Driver for that specific DBMS. This ODBC driver is
software that resides on the front-end. The ODBC driver translates the command
into a format that the ODBC Server can understand. The ODBC Server sends the
answer back to the ODBC Driver, which translates the answer into a format that
the ODBC Client can understand.
DB-LIBRARY APLICATION:
DB-Library for C is an application programming interface
(API) consisting of C functions and macros that allow an application to
interact with SQL Server. Included are functions that send Transact-SQL
statements to SQL Server and functions that process the results of those
statements. Other functions handle errors and convert data.
CLIENT- NET Library:
NET-Library is component which is responsible for
communication between client and server. Library which resides in client side
called client Net- Library. Here The OLE DB provider, ODBC driver, or
DB-Library DLL calls a client Net-Library. The client Net-Library calls an IPC
(Interprocess Communication) API. If it is a local IPC, calls are transmitted
using a Windows operating IPC such as shared memory or local named pipes. If it
is a network IPC, the network protocol stack on the client uses the network to
communicate with the network protocol stack on the server.
Thus
this decides what kind of protocol to be used for communication between client
and server.
Server Architecture:
On the left are the components that address compiling
queries, including the query optimizer. The optimizer is one of the most
mysterious parts of any relational database engine but is also one of the most
critical from a performance standpoint. It is the responsibility of the query
optimizer to take the nonprocedural request expressed in SQL and translate it
into a set of disk I/Os, filtering, and other procedural logic that efficiently
satisfies the request. On the right-hand side is the execution infrastructure.
This is really a much smaller set of facilities. Once the compilation
components have finished their job, they have created something that can be
directly executed with just a few services thrown in.
In the middle of the figure is something called the SQL
Manager. It controls the flow of everything inside the SQL Server. RPC messages
are handled by the SQL Manager, and in SQL Server 7.0 that is the majority of
invocations of functionality coming from clients. The pseudo system stored
procedures, are also logically a part of the SQL Manager. SQL statements
typically coming in as TDS SQL Language messages are processed directly from
the compilation side; this is less common in SQL Server 7.0 than in previous
versions, but is still common. Results are sent back out by components in the
execution engine calling ODS to format the TDS results messages.
There are also a couple of components that we'll only
mention briefly, which provide additional services within the relational
engine. One of these components is the catalog services component, which
handles data such definition statements as CREATE TABLE, CREATE VIEW, and so
forth. Catalog services also handle system tables, materializing those that are
really pseudo tables. The catalog services component is located primarily in
the relational engine, but actually about one-third of it operates within the
sphere of the storage engine, so it is treated as a shared component.
Another component of the relational engine is the User Mode
Scheduler (UMS), SQL Server's own internal scheduler for fibers and threads.
There is a very sophisticated internal mechanism for scheduling how work is
assigned to either fibers or threads, depending on how you've configured the
server, and allows SQL Server to do the proper load balancing across processors
on an SMP system. The UMS also keeps SQL Server from thrashing by running too
many threads concurrently. Finally, there are the system procedures that people
are familiar with; logically they are part of the relational engine.
Client/Server Interactions When Processing SQL:
We'll now take a look at what happens on the client side
when a client application is interacting with the SQL Server. Here is an
example of an ODBC call:
SQLExecDirect(hstmt, "SELECT * FROM TableName",
SQL_NTS)
As it’s clear from flow diagram, server checks whether
request from client is a cursor or not? If not then it goes to left side else
right side. Here select statement is not cursor type so goes to right side and
checked again whether this statement has parameter? , As select statement has
no parameter so sent to server as SQL message to the compiler.
Then following steps that SQL Server uses to process a
single SELECT statement:
1. The
parser scans the SELECT statement and breaks it into logical units such as
keywords, expressions, operators, and identifiers.
2. A query
tree, sometimes referred to as a sequence tree, is built describing the logical
steps needed to transform the source data into the format required by the
result set.
3. The query
optimizer analyzes different ways the source tables can be accessed. It then
selects the series of steps that returns the results fastest while using fewer
resources. The query tree is updated to record this exact series of steps. The
final, optimized version of the query tree is called the execution plan.
4. The
relational engine starts executing the execution plan. As the steps that
require data from the base tables are processed, the relational engine requests
that the storage engine pass up data from the rowsets requested from the
relational engine.
5. The
relational engine processes the data returned from the storage engine into the
format defined for the result set and returns the result set to the client.
References:
Client
Net library: http://msdn.microsoft.com/en-us/library/aa174500(v=sql.80).aspx
Good job man.
ReplyDeleteHappy to see you in blog world.
ReplyDeleteAre you going to be the SQL only guy? :)
This comment has been removed by the author.
DeleteNo joy , i will write about other technologies as well :)
Delete