SQL Panda: Client/Server Interactions When Processing SQL

Query Processor Internals and Architecture:

We always execute statements, execute queries but never think of what happens when we click on Execute /f5 isn’t? So here I would like to share my knowledge what I know about it. Please comment or add if I miss anything.

Before dig deep into it, let’s understand the internal architecture of server studio. We’ll look primarily at how things are processed on the client, how the various clients interact with SQL Server, and what SQL Server does to process clients' requests.

SQL Server Is a Client/Server System:

SQL Server is a client/server system which means that a piece of SQL Server, the client API portion, sits somewhere remotely in the process structure, separate from the server component itself.

First, a client sends in a login message (or event) on a connection and gets back a success or failure response. When you want to send in a SQL statement, there is a SQL language message that a client can package up and send to SQL Server.

The server runs a separate multithreaded process, and it services requests coming from clients, no matter where those clients are located. The real interface to SQL Server is the Tabular Data Stream (TDS) protocol that is spoken between the client and the server. TDS is a simple protocol. The TDS packets are encapsulated in the packets built for the protocol stack used by the Net-Libraries. For example, if you are using the TCP/IP Sockets Net-Library, then the TDS packets are encapsulated in the TCP/IP packets of the underlying protocol.

The contents of the packets that send result sets back to the application depends on whether FOR XML is specified in the Transact-SQL statement transmitted to the database engine:

• If FOR XML is not specified, the database engine sends a relational result set back to the application. The TDS packets contain the rows of the result set, with each row comprised of one or more columns, as specified in the select list of the SELECT statement.

• If FOR XML is specified, the database engine streams an XML document back to the application. The XML document is formatting in the TDS packets as if it were a single, long Unicode value, with each packet being approximately 4 KB in size.

Client Architecture:

Figure shows the simplified architecture of client. Our Application runs in client side which can use many APIs to interact with Databases. Here we can see some APIs like OLE-DB, ODBC, DB-Library, and Embedded SQL. These APIs interact with Net Library which selects suitable network protocol to transmit request to server side.

OLE-DB:

OLE DB (Object Linking and Embedding) is an API (Application Programming Interface) which is designed by Microsoft. Let us assume a condition when our application (web or desktop) have to interact with database in that case we use this API for Microsoft visual studio.

There are two main concepts in OLE-DB consumer and provider. Consumer (the data access application) accesses data from a data store with the help of a provider (the intermediary). I.e. when request comes from consumer which can be for retrieving data or updating the data store then provider helps to retrieve data or write back to the data store.

ODBC:

Similar to OLE-DB, this is also an API which access Databases. This is developed by SQL Access group in 1992.The best thing about ODBC is that application developers don't need to modify their applications to allow them to access data from several back-ends. As long as there is an ODBC Driver for a particular back-end, an ODBC-enabled front-end can access it. And second, one ODBC Driver for a particular DBMS allows any ODBC-enabled application to be an ODBC client.

Here there are two components ODBC CLIENT, ODBC DRIVER. The ODBC client uses a language or vocabulary of commands (which is referred to as "ODBC") to request data from, or to send data to, the back- end or server DBMS. However, the DBMS doesn't understand the ODBC client request until the command passes through the ODBC Driver for that specific DBMS. This ODBC driver is software that resides on the front-end. The ODBC driver translates the command into a format that the ODBC Server can understand. The ODBC Server sends the answer back to the ODBC Driver, which translates the answer into a format that the ODBC Client can understand.

DB-LIBRARY APLICATION:

DB-Library for C is an application programming interface (API) consisting of C functions and macros that allow an application to interact with SQL Server. Included are functions that send Transact-SQL statements to SQL Server and functions that process the results of those statements. Other functions handle errors and convert data.

CLIENT- NET Library:

NET-Library is component which is responsible for communication between client and server. Library which resides in client side called client Net- Library. Here The OLE DB provider, ODBC driver, or DB-Library DLL calls a client Net-Library. The client Net-Library calls an IPC (Interprocess Communication) API. If it is a local IPC, calls are transmitted using a Windows operating IPC such as shared memory or local named pipes. If it is a network IPC, the network protocol stack on the client uses the network to communicate with the network protocol stack on the server.

Thus this decides what kind of protocol to be used for communication between client and server.

Server Architecture:

On the left are the components that address compiling queries, including the query optimizer. The optimizer is one of the most mysterious parts of any relational database engine but is also one of the most critical from a performance standpoint. It is the responsibility of the query optimizer to take the nonprocedural request expressed in SQL and translate it into a set of disk I/Os, filtering, and other procedural logic that efficiently satisfies the request. On the right-hand side is the execution infrastructure. This is really a much smaller set of facilities. Once the compilation components have finished their job, they have created something that can be directly executed with just a few services thrown in.

In the middle of the figure is something called the SQL Manager. It controls the flow of everything inside the SQL Server. RPC messages are handled by the SQL Manager, and in SQL Server 7.0 that is the majority of invocations of functionality coming from clients. The pseudo system stored procedures, are also logically a part of the SQL Manager. SQL statements typically coming in as TDS SQL Language messages are processed directly from the compilation side; this is less common in SQL Server 7.0 than in previous versions, but is still common. Results are sent back out by components in the execution engine calling ODS to format the TDS results messages.

There are also a couple of components that we'll only mention briefly, which provide additional services within the relational engine. One of these components is the catalog services component, which handles data such definition statements as CREATE TABLE, CREATE VIEW, and so forth. Catalog services also handle system tables, materializing those that are really pseudo tables. The catalog services component is located primarily in the relational engine, but actually about one-third of it operates within the sphere of the storage engine, so it is treated as a shared component.

Another component of the relational engine is the User Mode Scheduler (UMS), SQL Server's own internal scheduler for fibers and threads. There is a very sophisticated internal mechanism for scheduling how work is assigned to either fibers or threads, depending on how you've configured the server, and allows SQL Server to do the proper load balancing across processors on an SMP system. The UMS also keeps SQL Server from thrashing by running too many threads concurrently. Finally, there are the system procedures that people are familiar with; logically they are part of the relational engine.

Client/Server Interactions When Processing SQL:

We'll now take a look at what happens on the client side when a client application is interacting with the SQL Server. Here is an example of an ODBC call:

SQLExecDirect(hstmt, "SELECT * FROM TableName", SQL_NTS)

As it’s clear from flow diagram, server checks whether request from client is a cursor or not? If not then it goes to left side else right side. Here select statement is not cursor type so goes to right side and checked again whether this statement has parameter? , As select statement has no parameter so sent to server as SQL message to the compiler.

Then following steps that SQL Server uses to process a single SELECT statement:

1. The parser scans the SELECT statement and breaks it into logical units such as keywords, expressions, operators, and identifiers.

2. A query tree, sometimes referred to as a sequence tree, is built describing the logical steps needed to transform the source data into the format required by the result set.

3. The query optimizer analyzes different ways the source tables can be accessed. It then selects the series of steps that returns the results fastest while using fewer resources. The query tree is updated to record this exact series of steps. The final, optimized version of the query tree is called the execution plan.

4. The relational engine starts executing the execution plan. As the steps that require data from the base tables are processed, the relational engine requests that the storage engine pass up data from the rowsets requested from the relational engine.

5. The relational engine processes the data returned from the storage engine into the format defined for the result set and returns the result set to the client.

References:

Client Net library: http://msdn.microsoft.com/en-us/library/aa174500(v=sql.80).aspx

OLE-DB: http://msdn.microsoft.com/en-us/library/windows/desktop/ms723100(v=vs.85).aspx

and http://msdn.microsoft.com/en-us/library/aa288452(v=vs.71).aspx