Press "Enter" to skip to content

Category: TSQL

Learning SQL, 2nd Edition – Creating tables as described in the book

The Premise

Had a conversation discussing a book that I actually studied myself… 10 years ago – a truly venerable publication. It came up that the table creation descriptions at the beginning of the book which I remember using … actually don’t exist.

I’m going to walk you through understanding the descriptions as provided, and a little guide to do the same with random data you find.

 

The Data

In the book we see the following two sets:

In the dept table, we have a DEPTNO, DNAME, and LOC (department number, department name, and location respectively, database people love shortening things)

In the emp table we have EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO

Now – depending on if you’re learning SQL Server, SQLite, postgres, MySQL, Snowflake, etc – there’s going to be a slightly different syntax for creating tables, selecting data, and rest – so keep your official documentation open and learn to love it.

For my example I’ll be using SQL Server so here’s the documentation on CREATE TABLE for SQL Server.

 

Our Process

First, we’ll declare and name our tables, then we’ll add the columns we need, and finally we’ll pick the types we want to use.

This is all boilerplate except for the emp and dept portions respectively – we’re saying create a table called emp and another called dept in the dbo schema with the following columns in parenthesis.

CREATE TABLE dbo.emp();
CREATE TABLE dbo.dept();

Pull out the columns from the dataset and put them in their respective table.

CREATE TABLE dbo.emp(EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO);
CREATE TABLE dbo.dept(DEPTNO, DNAME, LOC);

Finally, we want to define a data type to act as an appropriate box to place the data.

A well chosen data type maps well to the current and potential values in the column, and the more strict you can be about choosing smaller types or constrained values the more likely you’ll avoid nonsense data. In some cases it can also greatly improve performance.

Now, we’ll start with a very pragmatic approach (for time purposes) and I will leave you with a challenge at the end of the post!

Crack open the SQL Server Data Types as your reference, learning to read the official docs is invaluable.

The main categories are:

  1. Exact numerics – Exact numbers, use these.
  2. Unicode character strings – Modern text, use these.
  3. Approximate numerics – Generally avoid unless you need floating point math.
  4. Binary strings – You’ll know when you need them.
  5. Date and time – Useful for common date and time operations and comparisons.
  6. Other data types – Some useful, some weird, you’ll probably use uniqueidentifier and xml soon enough.
  7. Character strings – Less common these days but if you have specific types of data it can save you space.
Based on our screenshot above I see numbers with no decimal points or special notation that are “smaller” (Exact Numerics > INT) , I see a date that has no time component (Date and time > DATE), and some words that might be user input (Unicode character strings > NVARCHAR).

What does that leave us?

CREATE TABLE dbo.emp(EMPNO INT, ENAME NVARCHAR(50), JOB NVARCHAR(50), MGR INT, HIREDATE DATE, SAL INT, COMM INT, DEPTNO INT);
CREATE TABLE dbo.dept(DEPTNO INT, DNAME NVARCHAR(50), LOC NVARCHAR(50));

 

Importing Data

Before we make this any more complicated, let’s go over simple ways to import data.

There’s MANY data connectors and other methods you can use to INSERT data into a table, but if you’re just starting most of them feel pretty hard to me to bootstrap.

If you’re familiar with these methods then ignore this and move on, but if you don’t have experience using these tools and if you’re dealing with less than a few hundred rows of data here’s a simple way to accomplish your goal that almost anyone can do.

 

Disclaimer: Don’t this forever folks, there’s many faster and simpler ways once you have more programming chops, but we used to use this all the time in the support world.

This will also break on weird inputs with single quotes (you can replace them) and tabs (when you paste between environments they might think there’s extra columns.)

Crack open Google Sheets or Excel(if you have it) and copy/paste your data from the book/data source into it.

If you look closely you can see that all of our data ended up in Column A – this is bad!

Either use the clipboard icon in the lower right or the data tab to split text to columns. You can find a similar option in Excel.

In Excel we’d choose a tab as the separator, but Sheets makes it simpler when choosing what breaks things apart – for this data make sure to choose space.

After the separation, we can see that each piece of data ends up in own column, awesome!

 

 

 

 

 

 

Next, we need to make space between each column for some additional SQL syntax, so right click on each column and insert 1 column left.

When we’re done, go ahead and remove the top two rows:

This should leave us with just the data.

Next we want to add some code: in the first column and first row you type SELECT ‘

The next (and each) row you copy paste UNION ALL SELECT ‘

In every other new column you add a quote and a comma, but in both Excel and Sheets if you want to start a cell with a single quote you need to write a double single quote – ”,’ and ” respectively, your cells should look like:

When you finish your sheet should look like this repeating down as far as you have data.

Now here’s the interesting part – if you select all the data and copy/paste it into SSMS you’ll have a valid SQL statement (with one small flaw!)

So what’s the problem? Well the spreadsheets of the world tend to export tab separated data. So we want to remove the tabs.

How do we do that?

In SQL Sever Management Studio there’s a Find and Replace option with a special little feature – regular expressions.

We’re not going to do anything fancy, just click the star (to enable regular expressions) type \t into the find field, and nothing into the replace field. Click the far right Replace All button and watch the text tighten up.

Add your INSERT statement with your table and list of columns to the top of your data and you’re ready to rock!

INSERT dbo.emp(EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
SELECT '7369','SMITH','CLERK','7902','17-Dec-2005','800','20',''
UNION ALL SELECT '7499','ALLEN','SALESMAN','7698','20-Feb-2006','1600','300','30'
UNION ALL SELECT '7521','WARD','SALESMAN','7698','22-Feb-2006','1250','500','30'

I leave it to you to do the same on the dept table, its a lot less columns!

 

Final thoughts

We could add much more to this table declaration – we probably don’t want to create an employee with no name, or an employee with no department, or delete an entire department accidentally!

The create table syntax is flexible to handle these cases and more – declaring indexes or relationships can make a table much more powerful to model your problem and prevent misuse.

CREATE TABLE dbo.emp
(
  EMPNO INT IDENTITY PRIMARY KEY, 
  ENAME NVARCHAR(50) NOT NULL, 
  JOB NVARCHAR(50) NOT NULL, 
  MGR INT,
  HIREDATE DATE NOT NULL, 
  SAL INT NOT NULL, 
  COMM INT, 
  DEPTNO INT NOT NULL REFERENCES dbo.dept(DEPTNO)
);

CREATE TABLE dbo.dept
(
  DEPTNO INT IDENTITY(0,10) PRIMARY KEY, 
  DNAME NVARCHAR(50) NOT NULL, 
  LOC NVARCHAR(50) NOT NULL
);

There’s still plenty we could do to make this table declaration better! If you’re still reading and learning and want a starter challenge –

Can you figure out the optimal length for our NVARCHAR fields given the data in the book?
Can you write a SQL query that can tell us how long the fields should be?

Fixing – Error: “PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

During my attempt to connect PyCharm to a SQL Server data source with Windows Authentication and ran into a slew of errors. Upon searching I noticed there was a dearth of resources on this issue.

Some posts mention messing with your certificate store but with a local sever I dont have the interest in configuring certificate based auth.

From experience I knew a security issue reported in SQL Auth generally is related to the certificates/encryption settings on the connection, often turning it off works fine (don’t do this in production.)

I found nothing useful in the SSL/TLS – clearly the standard “uncheck encryption” trick isn’t managed here.

 

 

Looking further I found the Driver: section, here the MSSQL driver settings I needed are available.

picking advanced/driver settings

 

 

 

Click the Advanced tab.clicking the advanced tab in the driver settings

 

 

 

 

Update integratedSecurity to true and encrypt to false. changing pycharm driver settings

 

 

This fixed my authentication issues with the local server.connected window

How do companies handle blue green deployments with their SQL Server Database?

An interesting discussion/question in the SQL Community Slack today arose around how to implement blue/green deploys.

If you’re not familiar – blue/green refers to a deployment strategy with at least two hosts of your services, where you host in the green, deploy to the blue, and slowly drain the green to the blue until it becomes the green.

This has consequences in terms of keeping the lights on for both services, potentially rolling back the traffic to the green node (if the blue deployment fails some tests) and identifying things like dead code/data paths.

I was pondering how to answer the question in anything but the most generic way when this youtube video by Kevin Feasel was posted and it’s such a great resource I’m reposting it.

Thanks @reid and Kevin!

Fixing Unicode Conversion Issues in XML documents, TRY_CONVERT returns question mark instead of NULL

An interesting question asked by @danthesqlman in #sqlhelp (sqlcommunity.slack.com)

Having issues with Unicode in my XML, tried using a try_convert(varchar,fieldname) but not returning NULL.
Set it to have a test on my box, and weird results.
declare @n nvarchar(10) = N’ניקודות‎’
select try_convert(varchar(10),@n)
This doesn’t return NULL, but ?????????
I’m curious what would I be doing wrong, or how can i locate unicode within XML easily

And then when people suggested individual character shredding –

XML documents in a table over 200k rows, 2mb xml each, could take hours to parse 1 character at a time

There were a few suggestions, (my initial crap one was just dumping it to C#), but after a few jokes back and forth about how SQL Server was just returning normal question marks for TRY_CONVERT and how silly that was the idea came up… why not just:

  1. Replace all question marks with something unique? (I suggested a GUID)
  2. Run the conversion and then do a reverse replace, updating the data in place.
  3. Profit!

For a simple code example…

DECLARE @magic_value UNIQUEIDENTIFIER = NEWID()
SELECT
TRY_CONVERT
(
VARCHAR(100),
REPLACE(tar.name,'?',@magic_value)
)
FROM target_table AS tar

Any new question marks that exist in the output would be characters that failed the conversion process.

The test ran in ten minutes instead of a few hours… great!

Also a fun followup on weird SQL Server homoglyph conversion issues in general – https://bertwagner.com/posts/how-unicode-homoglyphs-can-thwart-your-database-security/

Grouping groups of strings within strings in TSQL.

So a friend of mine had a query puzzle – he needed to print some W2 forms but the forms themselves only allows up to 4 groups of a value to be placed in a box or else they’d need to issue two forms.

The pickle is that the source data is freeform csv text, so what can we do on the database side?

I reached for my handy CROSS APPLY, a numbers table, windowing functions and a little modulo arithmetic – its not so bad to return a subgroup within a group in SQL as long as you are willing to lay out additional columns to count your grouping.

Dynamic UNPIVOT Reporting on Cardinality.

Today I was diving into some extremely wide tables, I wanted to take a quick look at things like “How many unique values does this table have in every column?”.

This can be super useful if you have a spreadsheet of results or a schema without effective normalization and you want to determine which rows are the “most unique” – or have high cardinality.

The Github gist is embedded at the bottom of the page, but I will run you through the code in case you want an explanation of how it works

The procedure takes a schema and a table/view name and combines them – I could ask the user to do this for me, but there’s parts where its useful to break them up and I dont want to mess with split logic :p

The procedure is defined as a temporary stored procedure as I didnt want to leave a trace when I disconnect.

I then define the dynamic UNPIVOT statement, which we will feed our dynamic column listing from.

Gather our columns (twice) – once for the COUNT(DISTINCT), and once for the use in the columns we want to UNPIVOT.

Here’s an example of running it against a tsqlt testresult table:


As we can see, the tsqlt testresult table has one class, two messages and results, and unique everything else (so that’s where I will focus.)

I also utilize a @magic_value variable in the code (not shown) which I use to deal with the fact that a NULL value in an UNPIVOT statement wont count as an occurrence. I want to disambiguate from NULL and any particular value that might occur, so using something like -1 or some string NULL would be inappropriate.

That’s it for today!

Grab the full code below –

Postgres Plugin Support Added in Azure Data Studio!!

I know I have been writing a lot about ADS recently, but this is even bigger than the Notebook announcement.

A Postgres plugin has been announced in the insider release of ADS, and it just works!

If the term Postgres is unfamiliar – PostgreSQL is one of the preeminent open source database solutions and is showing wide adoption due to its quality and of course, price.

For those of you in the Postgres world, you already have probably bounced between a few SQL clients and if you are like me, not been totally impressed with the results.

For those of you who are SQL Server Data professionals and live in the comfortable world of SSMS and ADS, you may not know that a significant part of the world still runs queries on the command line, or is forced to rebuild their own tool to meet their needs in each language de jour.

I have been working on some side projects with Docker, Postgres, and a few other pieces of tech, so this announcement was perfect for me to test it out.

Configuration

Just like my post on Notebooks, I am going to download the most recent release from the Insider’s build, but I am going to keep the screenshots a little more brief.

Make sure you enable those preview features!

Jump to the extensions section and find the Postgres plugin. (Sidebar button)

I used a local connection to a docker container hosting my Postgres instance and Azure Data Studio connected the first try!

Intellisense works great!

Notebooks instantly inferred my context from the Postgres connection! Sweet!

Even the explain button pulls the query plan 😀

I love this direction for Azure Data Studio and I hope we see more open source databases supported in the future!

Update: T-SQL Notebooks in Azure Data Studio

It’s only been ten days since my first post about ADS Notebooks, and the team has already pushed out some super useful updates.

Now launching an Azure Data Studio Notebook now is almost instantaneous. You are not prompted to install the other kernels unless you select them.

When reopening a file, you no longer are prompted to re-install a kernel and attempt configuration.

Displays Azure Data Studio Notebooks behavior when reopening a notebook

Big thanks ADS team, Notebooks look to be shaping up to be a super cool tool!

T-SQL Notebooks in Azure Data Studio

Update: This blog post is already out of date due to the hard work of the ADS team! Check my Azure Data Studio Notebook Update for more details of why you can skip my warnings about your internet connection.

I have been waiting for word about the new Notebook functionality in Azure Data Studio, and when I heard it was available in the insider build, I jumped in to take a look.

A Jupyter Notebook is a web application that allows you to host programming languages, run code (often with different programming languages), return results, annotate your data, and importantly, share the source controlled results with your colleagues.

Alright, here’s what I did:

  • Grab the insider’s build from Github.
  • Install Azure Data Studio (ADS.)
  • Create a new Notebook (and have it download some stuff.)
  • Run some queries!

Keep in mind I am talking about the Insider’s Build as of as of Thursday March 7, 2019 – if you download it in the future changes are likely to have occurred.

If you want to follow along, make SURE you have a solid internet connection available, planes do NOT count.

Jump to the ADS Github page

Scroll to the latest insiders builds:

Extract or install, and launch Azure Data Studio.

Keep in mind you may want to disable updates when running the insider version, as it will try to update to the stable channel version. Thanks to Shawn Melton for the tip!

Install the tools by choosing File -> New Notebook.

The install process will warn you this will take awhile to download and configure the Jupyter environment (which includes python and a few other dependencies.)

Every time you create a new notebook, you are going to download the required dependencies from scratch – this is to ensure you have an isolated environment per notebook, but its worth keeping in mind for now.

Go get some coffee, its going to be a bit.

Maybe time for that second cup.

When our new notebook pops up, you should immediately be offered an option for a Code(TSQL) or a Text(markdown) box.

I wrote two example queries and added an animated gif (mostly because I was watching Chris on the SQL Tools team doing the same on the sqlbits presentation for ADS https://sqlbits.com/Sessions/Event18/Introducing_Azure_Data_Studio )

All the data you return is saved into your notebook, so you can see the same results as I did if you download my notebook, and you can replay it in your environment to see if it is the same.

Going to end it there for now, but I am excited to see what people will begin passing around in notebooks for debugging, training, and demonstration purposes!

Some issues I am going to be keeping my eyeballs on the next few weeks:


Removing non-numeric characters from a SQL Server field without UDFs or Regex

A great question came up in the SQL Community Slack today:

I saw this as a great modification on my earlier post, and wanted to show another way to implement the same solution.

Walking through the steps:

First, use sys.objects as our example “target” and assume the string has less than 2024 characters (spt..numbers isnt really reliable past that point.)

We substring out each character and compare it to the list of items we accept, and then collapse the list back in on itself.

At the end, I pull out sys.objects for comparison so you can see what sort or results you would get. Enjoy!