Press "Enter" to skip to content

Select Indistinct

Removing Microsoft’s News In The Taskbar

So in a “not shit crap again” post – MSFT releases a “News” update to take up space on your taskbar and just allow Microsoft to serve you more ads.

The boiling of the frog of the modern treatment of privacy is so annoying, so here’s the registry path to disable it.

Cool that you get to open the registry instead of just get presented a “Would you like a new informative tool by Microsoft?” It’s almost like they know that NOBODY wants this.

Remember – dont just randomly run registry files – they are plain text documents interpreted in a special way – so crack them open and make sure you feel comfortable running them.

You can review/download the data here:

How do companies handle blue green deployments with their SQL Server Database?

An interesting discussion/question in the SQL Community Slack today arose around how to implement blue/green deploys.

If you’re not familiar – blue/green refers to a deployment strategy with at least two hosts of your services, where you host in the green, deploy to the blue, and slowly drain the green to the blue until it becomes the green.

This has consequences in terms of keeping the lights on for both services, potentially rolling back the traffic to the green node (if the blue deployment fails some tests) and identifying things like dead code/data paths.

I was pondering how to answer the question in anything but the most generic way when this youtube video by Kevin Feasel was posted and it’s such a great resource I’m reposting it.

Thanks @reid and Kevin!

Fixing Unicode Conversion Issues in XML documents, TRY_CONVERT returns question mark instead of NULL

An interesting question asked by @danthesqlman in #sqlhelp (sqlcommunity.slack.com)

Having issues with Unicode in my XML, tried using a try_convert(varchar,fieldname) but not returning NULL.
Set it to have a test on my box, and weird results.
declare @n nvarchar(10) = N’ניקודות‎’
select try_convert(varchar(10),@n)
This doesn’t return NULL, but ?????????
I’m curious what would I be doing wrong, or how can i locate unicode within XML easily

And then when people suggested individual character shredding –

XML documents in a table over 200k rows, 2mb xml each, could take hours to parse 1 character at a time

There were a few suggestions, (my initial crap one was just dumping it to C#), but after a few jokes back and forth about how SQL Server was just returning normal question marks for TRY_CONVERT and how silly that was the idea came up… why not just:

  1. Replace all question marks with something unique? (I suggested a GUID)
  2. Run the conversion and then do a reverse replace, updating the data in place.
  3. Profit!

For a simple code example…

DECLARE @magic_value UNIQUEIDENTIFIER = NEWID()
SELECT
TRY_CONVERT
(
VARCHAR(100),
REPLACE(tar.name,'?',@magic_value)
)
FROM target_table AS tar

Any new question marks that exist in the output would be characters that failed the conversion process.

The test ran in ten minutes instead of a few hours… great!

Also a fun followup on weird SQL Server homoglyph conversion issues in general – https://bertwagner.com/posts/how-unicode-homoglyphs-can-thwart-your-database-security/

Disable powershell update nag in one line

[System.Environment]::SetEnvironmentVariable("POWERSHELL_UPDATECHECK",0,[System.EnvironmentVariableTarget]::User)

To be clear – I think you should be updating your PowerShell regularly, however the HUGE WHITE BLOCK ACROSS MY ENTIRE SCREEN EVERY TIME I LAUNCH VISUAL STUDIO CODE ISN’T GREAT.

Hated that caps? Yeah, that’s basically my eyes every time I see this nag window inverting the colors across my ennntiiirrreee screen.

I checked the PS repo and some one liners posted didnt work (and used SetEnvironmentVariableTarget which was not a method I had?), so I wanted to make this easy in case you are getting frustrated with the PowerShell update version check message and you want it to go away and didnt want to crack open the environment variables.

Now go update your PowerShell 🙂

Awesome tools: Papa Parse

My most recent problem for my side project was implementing a drag and drop upload for CSV(character separated value) data.

From experience I know that CSV data can fairly easily be malformed so I wanted to be able to present a nice formatted list for a user to be able to preview and opt out of certain items.

I started reading and writing some code to take a dropped file (which turns out to be straight forward in HTML5) and call some sort of preview function with it.

After a few frustrating hours and an active disinterest in node modules, I found a tool that really solved the problem right for me: Papa Parse.

Somewhere between a spreadsheet and a set of buttons I built a simple CSV preview.

As I would qualify my JavaScript as “learning level”, one of the big wins for me is any tool which solves my problem in a sane and reusable way without going off the deep end into JS land.

This was the entire set of code I needed to take a file object and return a useful set of rows.

let config = {
 delimiter: "", // auto-detect
 newline: "", // auto-detect
 quoteChar: '"',
 escapeChar: '"',
 header: true,
 skipEmptyLines: true,
 delimitersToGuess: [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP] }

let useful_data = Papa.parse(data, config = config);

Shout out to jQuery for the rest – on finishing the load call a function to draw a form with some checkboxes, and a submission form for them all.

Useful Django Bits

I have been busy working on some other non-SQL related side projects recently, and I wanted to note some of the pieces of code I have been appreciating recently.

https://github.com/PaesslerAG/django-currentuser is a simple plugin that allows you to reference your current user context in your models various functions. This greatly simplified some user management functions within my codebase, as I could express it all in the model.

https://www.django-rest-framework.org/ is a powerful framework on top of Django that allows you to build a straightforward rest framework. Django doesn’t have object based permissions and I have been building out the next version of my codebase with it, its definitely a lot more pluggable than anything I designed.

https://pythoncircle.com/post/439/server-access-logging-in-django-using-middleware/ an easy way to track user access – one migration adds a log to your table, and you get whatever you want out of each request flow. Be careful that you follow your GDPR/CCPA guidelines!

https://github.com/pennersr/django-allauth is something I have been investigating but it seems a bit much for my goals, I will come back and update more about this soon.

First glance at PowerBI

A simple problem I think plenty of people are having – can you take our data and whip up a simplified BI tool UI. Today I am reviewing PBI.

The PBI Service allows you to publish your reports with the PBI Desktop tool, which requires you to setup an embedded PBI service in your azure account.

To get data in PBI is straightforward and many data sources are acceptable – I am tempted to jump in with the SQL DB option, but as this is all prototype stages I stuck with the tried and true Excel. I know that this means I skip over the data load issue and the like, but that’s not the purpose of this activity.

As we are looking at an analytical dataset I created a calendar tab, a dimension tab, and two fact tabs with dummy data.

The PBI import process is pretty smooth on the first attempt but you’ll likely want to do some cleanup. Click on the modeling tab and verify its inferred your relations in the way you expect, or add a few of your own.

As the data was imported from Excel I noticed that the Sigma symbol was missing from many of the columns (a sign you can use the value in summary calculations) – because it was a numeric field with NULLs, and in the import process it was decided that the field was a string field.

For the purposes of this demo it was fine to switch from NULL to 0 values, so I updated the data in the sheet, updated the type in the model, and refreshed the data – no problem. This may not be the best solution if the zero is meaningful from NULL in your calculations.

I experimented with the various built in visuals and noted that there’s some pretty good third party “verified” visuals as well. I would say overall most of the visualizations are very straightforward to use with the one exception of the pie chart sub-groupings being the same colors.

Once I had published my starter visual, I immediately wanted to go back and make revisions, and the desktop client made that fairly easy until I wanted to remove a column. Removing a reference is seemingly verboten in PBI, and it would not merge the changes.

It looks like whenever you want to make a breaking change you might have a bit more GUI work to do, bringing me back to SSIS woes, and I didn’t see any nice ways to “update all references” or something like that, so this might be a tedious step if you are making many changes.

Other things that struck me:

The fact that there’s a desktop app (windows only) for PBI publishing is a bit of a pain at the moment, but it was quick at manipulating the various visualizations – its just weird that its clearly some electron app compromise (so why not just the web?)

Drilling into data, exporting data, analyzing data – these things are super nice compared to any previous reporting system I have worked with.

It was disappointing to learn that as a Pro PBI user you cant share with other users without a Pro license (unless you pony up 4995 a month to start.) It’s not a big deal, but hopefully one day there’s a usage based use case.

Creating dashboards and apps is confusing – you cant see a reference to anything unless you first go to a workspace, then you can create a dashboard from elements of a report. I don’t understand why this is so fragmented – just let me manage this in either the desktop client or at least make it available in the top level UI.

Grouping groups of strings within strings in TSQL.

So a friend of mine had a query puzzle – he needed to print some W2 forms but the forms themselves only allows up to 4 groups of a value to be placed in a box or else they’d need to issue two forms.

The pickle is that the source data is freeform csv text, so what can we do on the database side?

I reached for my handy CROSS APPLY, a numbers table, windowing functions and a little modulo arithmetic – its not so bad to return a subgroup within a group in SQL as long as you are willing to lay out additional columns to count your grouping.

Wrapping my head around object level permissions with Django

A quick aside, if you read my SQL stuff – I have been working on a side project with python, Django (a web framework), with a focus on bringing some sanity to recipes.

For the site I needed to implement a system where a user owns recipes, and other users cant modify them.

In Django there’s a built in auth model that supports both user registration and model level permissions(yay!), but it means that naively all users could edit all recipes(boo!)

I did some quick searching because I was hopeful that I could reuse some of the well thought out components already available.

The big options I found were django-guardian, Rules, django-rest, or figuring it out myself.

Checking in with Rules, it looks like its not so much a Django specific implementation as much as a way to implement rules + predicates to get things done, seems interesting but its not clear from the older examples provided how its supposed to work with a modern Django – I am going to skip it for now.

Checking in with django-guardian, I am seeing better examples that are relevant, but truthfully I need 1 owner as a permission.

On the face of it, I feel like a lot of these systems are really complex for my purposes – its cool to be the ultimate system but I just need a system that provides:

  1. Users which have permission to objects.
  2. Permissions means you have all roles on the object (edit/delete.)
  3. Each object that is ownable on creation must have a creator/author, who when its assigned can automatically edit/delete the thing.

Since I don’t need a complex permission system, I don’t care about meta-magic, and I just want to design a straight forward set of gates to manage, I decided to go off on my own and work on implementing an author that’s checked.

I added an author field to my model, the current user sets it in the model if its not already set, and then I use the is_current_authenticated_user() function from the django-currentuser package to tell if this matches the current request.

I also implemented a simple map behind the scenes that connects forms to models to pages so that the add/edit functions can just do their work and pass the correct set of forms back to the correct set of pages – I will have to keep looking because it wasnt clear how to infer the model’s form class automatically without some mightier python than I current posses.

I disable the form fields for niceties in the UI if you don’t own the object (so you dont accidentally try and edit it), but to be sure I also prevent saving on the backend if things don’t match up, and there’s an added bonus of free CSRF built into Django.

And so after all that I have straight forward solution without even touching the rules code – it took me more time to noodle over third party options than just set down and write it.

Recipe Scraping with Python

I am working on a new project with the aim of scraping a wide variety of data sources, with goals as various as arrest data and recipe websites.

I am going to catalog the project’s process through the various issues, and try to encapsulate as much as possible so that if there are components other’s want to use they can.

Here’s the process I am taking:

  1. Decide on a language.
    • What languages are well supported for web scraping?
    • Plenty, but of them I am the most familiar with python.
  2. Decide on feature set.
    • We are going with a headless HTML scraper for now, javascript will be supported in future versions.
    • We need to populate some basic datapoints for a recipe, but we’ll progressively enhance sites that don’t have the things we want.
  3. Build initial targeted list of sites to scrape
    • This was done by my partner in crime, more on that later.
    • >200 sites listed to target specific delicious recipes.
  4. Find some useful resources for scraping sites with python.
  5. Construct simple loop for testing/validatign data.

This Sunday’s execution:

  1. Get an isolated VM up.
  2. Install python3 and python3-venv on debian.
  3. Find there’s actually a few good tools which implement this in a more general way: https://github.com/hhursev/recipe-scrapers
  4. Test that out on a specific site (sorry, none of those links just yet!)
  5. Find that recipe-scrapers actually supports it… automagically, even though its not on their supported site list.
  6. Well… that was easy.

So I didn’t expect a great package to be available out of the box for the problem at hand, so kudos to these wonderful devs.

For my next post I will test combining spidering and downloading to create a cohesive “cookbook” from a target site.