Press "Enter" to skip to content

Category: Uncategorized

Awesome tools: Papa Parse

My most recent problem for my side project was implementing a drag and drop upload for CSV(character separated value) data.

From experience I know that CSV data can fairly easily be malformed so I wanted to be able to present a nice formatted list for a user to be able to preview and opt out of certain items.

I started reading and writing some code to take a dropped file (which turns out to be straight forward in HTML5) and call some sort of preview function with it.

After a few frustrating hours and an active disinterest in node modules, I found a tool that really solved the problem right for me: Papa Parse.

Somewhere between a spreadsheet and a set of buttons I built a simple CSV preview.

As I would qualify my JavaScript as “learning level”, one of the big wins for me is any tool which solves my problem in a sane and reusable way without going off the deep end into JS land.

This was the entire set of code I needed to take a file object and return a useful set of rows.

let config = {
 delimiter: "", // auto-detect
 newline: "", // auto-detect
 quoteChar: '"',
 escapeChar: '"',
 header: true,
 skipEmptyLines: true,
 delimitersToGuess: [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP] }

let useful_data = Papa.parse(data, config = config);

Shout out to jQuery for the rest – on finishing the load call a function to draw a form with some checkboxes, and a submission form for them all.

First glance at PowerBI

A simple problem I think plenty of people are having – can you take our data and whip up a simplified BI tool UI. Today I am reviewing PBI.

The PBI Service allows you to publish your reports with the PBI Desktop tool, which requires you to setup an embedded PBI service in your azure account.

To get data in PBI is straightforward and many data sources are acceptable – I am tempted to jump in with the SQL DB option, but as this is all prototype stages I stuck with the tried and true Excel. I know that this means I skip over the data load issue and the like, but that’s not the purpose of this activity.

As we are looking at an analytical dataset I created a calendar tab, a dimension tab, and two fact tabs with dummy data.

The PBI import process is pretty smooth on the first attempt but you’ll likely want to do some cleanup. Click on the modeling tab and verify its inferred your relations in the way you expect, or add a few of your own.

As the data was imported from Excel I noticed that the Sigma symbol was missing from many of the columns (a sign you can use the value in summary calculations) – because it was a numeric field with NULLs, and in the import process it was decided that the field was a string field.

For the purposes of this demo it was fine to switch from NULL to 0 values, so I updated the data in the sheet, updated the type in the model, and refreshed the data – no problem. This may not be the best solution if the zero is meaningful from NULL in your calculations.

I experimented with the various built in visuals and noted that there’s some pretty good third party “verified” visuals as well. I would say overall most of the visualizations are very straightforward to use with the one exception of the pie chart sub-groupings being the same colors.

Once I had published my starter visual, I immediately wanted to go back and make revisions, and the desktop client made that fairly easy until I wanted to remove a column. Removing a reference is seemingly verboten in PBI, and it would not merge the changes.

It looks like whenever you want to make a breaking change you might have a bit more GUI work to do, bringing me back to SSIS woes, and I didn’t see any nice ways to “update all references” or something like that, so this might be a tedious step if you are making many changes.

Other things that struck me:

The fact that there’s a desktop app (windows only) for PBI publishing is a bit of a pain at the moment, but it was quick at manipulating the various visualizations – its just weird that its clearly some electron app compromise (so why not just the web?)

Drilling into data, exporting data, analyzing data – these things are super nice compared to any previous reporting system I have worked with.

It was disappointing to learn that as a Pro PBI user you cant share with other users without a Pro license (unless you pony up 4995 a month to start.) It’s not a big deal, but hopefully one day there’s a usage based use case.

Creating dashboards and apps is confusing – you cant see a reference to anything unless you first go to a workspace, then you can create a dashboard from elements of a report. I don’t understand why this is so fragmented – just let me manage this in either the desktop client or at least make it available in the top level UI.

Recipe Scraping with Python

I am working on a new project with the aim of scraping a wide variety of data sources, with goals as various as arrest data and recipe websites.

I am going to catalog the project’s process through the various issues, and try to encapsulate as much as possible so that if there are components other’s want to use they can.

Here’s the process I am taking:

  1. Decide on a language.
    • What languages are well supported for web scraping?
    • Plenty, but of them I am the most familiar with python.
  2. Decide on feature set.
    • We are going with a headless HTML scraper for now, javascript will be supported in future versions.
    • We need to populate some basic datapoints for a recipe, but we’ll progressively enhance sites that don’t have the things we want.
  3. Build initial targeted list of sites to scrape
    • This was done by my partner in crime, more on that later.
    • >200 sites listed to target specific delicious recipes.
  4. Find some useful resources for scraping sites with python.
  5. Construct simple loop for testing/validatign data.

This Sunday’s execution:

  1. Get an isolated VM up.
  2. Install python3 and python3-venv on debian.
  3. Find there’s actually a few good tools which implement this in a more general way: https://github.com/hhursev/recipe-scrapers
  4. Test that out on a specific site (sorry, none of those links just yet!)
  5. Find that recipe-scrapers actually supports it… automagically, even though its not on their supported site list.
  6. Well… that was easy.

So I didn’t expect a great package to be available out of the box for the problem at hand, so kudos to these wonderful devs.

For my next post I will test combining spidering and downloading to create a cohesive “cookbook” from a target site.

Converting TIMESTAMP to VARCHAR in TSQL or Replacing fn_varbintohexstr

Just a quick one in case I forget – I had a scenario where I needed to replace fn_varbintohexstr in converting an existing product to something more Azure SQL DB friendly.

I did some googling and didn’t like the results, especially since most of them are returning NVARCHAR(MAX) instead of what they generally should – just VARCHAR(length of data) and in the case of my needs (timestamps) the value was always 18 characters.

The key was finding the conversion to binary first, and then the use of the ,1 param for convert.

select convert(varchar(18), convert(varbinary(18), row_version), 1)

That’s all folks.

Short and Sweet – Implementing simple character masking without UDFs

I’ve been on a bit of a hiatus with the move to California and figuring out my new life here.

I needed some inline TSQL to mask a string based on a simple masking input, like ‘##xx##’ where # would return data and X would return X.

Here’s the code, actually a bit simpler than I thought it would be.

The first table would be the one you would be using (I used sys.objects as an example), and if you wanted to store a masking string then you’d just want to make sure it was joined and projected in the substring instead of the @formatstring.

Would you be involved with PASS without a SQL Saturday?

I wanted to bring you a quote from #SQLSaturday channel in the SQL Community Slack that I found amazing:

“They [the PASS board] are already signaling they don’t want to spend $$ on SQL Sat and a few of the board members would just as soon SQL Sats die.”

Having missed the Q/A at the Summit, this blew my mind.

Everyone I know who joined PASS had a requisite SQL Saturday or local user group to get them interested and welcomed into the community.
Even after my second SQL Saturday I didn’t really recognize what PASS was (beyond my local UG) and if someone had asked about the international org, I would have responded with (at best) “uh, sure, I guess?”

In my opinion, the local user groups are the primary thing that matters, the national/international components of the org may put out a few high quality webinars on a weekly basis, but 99% of that content is not related to my problems at hand. The local UG/SQL Saturday gives me valuable time with experts which I don’t have to pay thousands of dollars for to get on a plane or buy a ticket to the Summit.

If you ask the question “Would the SQL PASS Summit exist without SQL Saturday?” I think the answer is no, or if yes, then a much smaller less welcoming affair.

Agree or disagree?

I would love to hear more from the community whether they cherish and think SQL Saturday should be funded, or if it seems like a dead end activity for PASS.

 

 

 

 

Slack Asks: Scripting out all objects from a SQL Server database with PowerShell and dbatools

@brianszadek says: I am looking to automate the scripting of objects with PowerShell for easier source control integration.

I thought this would be a little bit easier, but it looks like the SMO way of doing this is still fairly complicated. Maybe something I can look at simplifying for the project in a bit 🙂

For completeness I also included my favorite tool (schemazen) and Microsoft’s new scripting tool if you want a quick shell approach.

PowerShell: (dbatools)

$SqlInstance = "192.168.0.23,1433"
$DatabaseName = "breakme"
$Server = Connect-DbaInstance $SqlInstance
$Database = $Server.Databases[$DatabaseName]

# Define an empty list
$ObjectList = @()
$Options = New-DbaScriptingOption
$Options.AllowSystemObjects =
  $Options.AppendToFile = $false

$Options.Triggers =
  $Options.ToFileOnly =
  $Options.ScriptBatchTerminator =
  $Options.DriAll =
  $Options.Indexes =
  $Options.Default =
  $Options.SchemaQualify =
  $Options.ClusteredIndexes =
  $Options.NonClusteredIndexes = $true

# Surprisingly, $Database.EnumObjects will return a set of strings, not SMO objects, this is kind of sucky.
$ObjectList += foreach ($ob in $Database.EnumObjects()){ # Get all objects
  if ($ob.Schema -notin "sys","INFORMATION_SCHEMA"){ # Not in system schemas
    $o = $Database.Resolve($ob.Urn) # Resolve the object path to an SMO object
    if (($o | Get-Member | Select-Object -exp Name) -contains "Script"){ # If it has a script property
      $o # Return it
    }
  }
}


function Rename-ObjectToSqlFile {
  param ($ObjectName)
  $Name =  $ObjectName -match '.*\/(.*)'
  if ($Name -eq $false){
    $Name = $ObjectName
  } else {
    $Name = $matches[1]
  }
  $Name = $Name + ".sql"
  return $Name
}

# Iterate, get one file per object, decide on how you want to name them, because things like event notifications are uris
foreach ($obj in $ObjectList) {
  $Name = Rename-ObjectToSqlFile -ObjectName $obj.name
  Export-DbaScript $obj -ScriptingOptionsObject $Options -Path $Name # Export it with the dbatools powers
}
# Or export based on category
Export-DbaScript $ObjectList -ScriptingOptionsObject $Options # Export object categories to each file

Note: Export-DbaScript currently has a bug that prevents servers using SQL Authentication from scripting, this has been fixed in the next version.

Command line executables

Schema Zen (C#) (Free, OSS)

If you haven’t heard of SchemaZen, its a great and insanely fast tool for exporting all the objects in a database to a folder, and if needed, take the same folder and create a database from it.

There are also some experimental data export features.

MSSQL Scripter (python) (Free, OSS)

If you haven’t used the MSSSQL Scripter, its Microsoft’s tool that utilizes the new SqlToolsService to request and script out DDL information. They have a lovely architecture page if you want to learn more.

“Fixing” Visual Studio Code or PowerShell ISE Cached Variables

One of the features of both Visual Studio Code and the PowerShell Integrated Scripting Environment is allowing you to run a line of code with the F8 key.

This is super cool and useful, but it has a side effect because to validly be able to run a specific line of code, you often need to have stored all the results up to this point.

In my experience, I have received all sorts of strange behavior from both PowerShell ISE and Visual Studio code that came down to “oh, I remembered the last time you used that variable and now I am going to cause you problems.”

When I am writing code, I expect that adding a variable would mean that before you add it, it wouldn’t exist, and after you declared it, it would exist, we can see that behavior below:

 

However, in both VS Code and the ISE, if you run the same code again, you might be surprised at seeing the numbers of variables cease to change.

While the benefits are clear and deliberate, I prefer to be able to run from a clean slate each time, and PowerShell gives us a few mechanisms to do that. I thought about Get/Remove variable, but there’s no clear properties that I can see to distinguish user variables from environment ones, so this is what I came up with:

You can see, that now that we defined our code in a scriptblock, the variables do not leak out of their context and we can run our code over and over without worrying about existing variables.

Generating Tables of Contents for Github Projects with PowerShell

Table Of Contents

 

Initial Idea

Let”s generate a table of contents for a Github flavored Markdown document in PowerShell (because nobody else seems to have.)

 

Clarifying Our Requirements

After talking with Claudio, he didn”t just want a TOC for a particular markdown file (though that”s nice), but he also wanted a TOC for a set of files in a folder. I decided to do the first one, and it looks like Claudio just finished the second.

I did some googling and found that Github flavored markdown supports a special form of anchor linking as it isn”t supported in vanilla MD. I also found that beyond this there wasn’t a good anchor link MD shortcut, so I took this tact (which cares more about readability than speed or length.)

You can use the function below, which was also run on this blog post to generate itself.

 

The Code

function Convert-StringToMarkdownTableOfContents {
    param ([string]$MarkdownDocument)

    $nl = [System.Environment]::NewLine
    $TOC = "## Table Of Contents$nl$nl"

    foreach ($line in $MarkdownDocument.Split( $nl )){
        if ($line -match "^(#+)\s(.*)$") {
            $Count = $Matches[1].Length
            $Content = $($Matches[2])

            $EncodedContent = ($Content -replace " ",''-'').ToLower()

            if ($Count -eq 1){
                $TOC += "- [$Content](#$EncodedContent)$nl"
            }
            else {
                $TOC += "$("  " * $Count)* [$Content](#$EncodedContent)$nl"
            }

            $MarkdownDocument = $MarkdownDocument -replace "(($line)\w*)",$(
                ''<a href = "#'+ $EncodedContent +''"></a>'' +
                $nl +
                $line
            )
        }
    }
    return $TOC + $nl + $MarkdownDocument
}

You can download this code which generates this post Here.

 

Special Thanks

Special thanks to Cláudio Silva (PowerShell MVP) for giving me this fun idea!

Note:
Unlike Github Flavored Markdown, WordPress Markdown does not implement the anchor links, so in my blog they wont be clickable like they would github, sorry!

Automate and Simplify DBCC CHECKDB

T-SQL Tuesday Logo

As a part of the T-SQL Tuesday run by Rob Sewell this month, I wanted to throw in a little post about using Test-DbaLastBackup which I will also be showing off a this year’s SQL Summit!

 

Come to the Precon

If you like this post, don’t forget to signup for me and Chrissy Lemaire’s precon at the 2017 SQL Summit for more awesome PowerShell <3 SQL Server magic with dbatools!

Test-DbaLastBackup

As data professionals we know that safeguarding our data is one of our most important jobs, and yet a significant amount of us continue to find excuses not to verify the integrity of our data.

I want to show you an uber simple method to automate testing your entire estate worth of backups.

Before we move forward, I want to note this process does perform a restore and DBCC checkdb on the server you specify, so please don’t run this in production and come to me upset that your IO and CPU time shot through the roof 😀

However, if you only have a smallish or sleepy production environment, this will run just fine side by side, otherwise I demonstrate a parameter to run this on a different server.

Lastly, before we begin, this assumes you’ve already installed and imported dbatools, because why wouldn’t you have?

Code

"sql2012","sql2005" | Test-DbaLastBackup -Destination "Sql2016"

This:

  • Defines a list of two servers (PowerShell lists are as easy as “”,””)
  • Pipes them to the Test-DbaLastBackup command.
  • Which then:
    • Gathers information about the last full backups for all of your databases on that instance.
    • Restores the backups to the Destination with a new name. If no Destination is specified, the originating server will be used.
    • The database is restored as “dbatools-testrestore-$databaseName” by default, which you can change with the -Prefix parameter.
    • The internal file names are also renamed to prevent conflicts with original database.
    • A DBCC CHECKTABLE is then performed.
    • And the test database is finally dropped.

If you are super stuck on the restore portion but would like to do a VERIFYONLY, there is a -VerifyOnly parameter switch.

Some other useful parameters you might need to use in a typical scenario:

-Database Allows you to specify a list of databases to specifically check, useful if you have just one in mind.

-MaxMB Prevents databases over X MB from being restored.

-ExcludeDatabase Exclude specific databases (for that one gigantic one you know you don’t want to touch.)

-DataDirectory Alternate data directory (besides default) you want to restore the database files to test.

-LogDirectory Alternate log directory (besides default) you want to restore the database files to test.

Watch

If you like video, wait no more, here’s a youtube demo of Test-DbaLastBackup

https://www.youtube.com/watch?v=50xEuEZr6as

Thanks for your time, and I hope you enjoyed.