Finding SQL varchar(max)

SQL Error in R

SQL code to find the columns of max that need to be moved to the end of code for importing to R

Zoë Turner
02-13-2021
Photo of tree branches in silohette

Figure 1: Photo of tree branches in silohette

Import problems

Tweet: Dates stored as varchar is the SQL equivalent of 'red pants in the white wash'

Figure 2: Tweet: Dates stored as varchar is the SQL equivalent of ‘red pants in the white wash’

@_JohnMackintosh posted in Twitter about his ‘red pants in the wash’ being those date columns that are in fact varchar(). I’ve been very lucky as our data warehouse people have consistently formatted dates in the SQL tables including my favourite integer yyyymmdd format which beautifully imports into R with no issues.

What I have found though, which is my ‘red sock in the wash’, is varchar(max) or nvarchar(max) data which, it seems, causes an import error unless these columns are placed at the end of the select script or recoded by using something like CAST(colm AS varchar(150)).

My sock

The first time I noticed this, it was because of a table I had built myself. I had put in a column for comments and naturally went for varchar(max). To solve the problem I just changed the data type for the column - because I could.

Others’ socks

The next time I encountered this, which wasn’t too long after, it was in a database I couldn’t update so I found my import scripts to R failing with an error:

Error in result_fetch(, n) : nanodbc/nanodbc.cpp:3011: 07009: [Microsoft][ODBC SQL Server Driver]Invalid >Descriptor Index Failed to execute SQL chunk

which isn’t really descriptive but searching for the error brought this up for the ODBC package:

GitHub Issue: Short story is this is a bug in the MIcrosoft Driver with varchar(max) columns and they need to be at the end of the select query to work.

Figure 3: GitHub Issue: Short story is this is a bug in the MIcrosoft Driver with varchar(max) columns and they need to be at the end of the select query to work.

Finding the socks

Some of the tables I was working with have nearly 100 columns and whilst I can scan through for them, I’ve recently had to import about 20 tables so this would be long-winded.

First, find your data type number:


-- USE database_example

SELECT system_type_id, name
FROM sys.types
WHERE system_type_id = user_type_id

Then when you know what varchar(max) or even nvarchar(max) is listed under for your own database search for it with this:


-- USE database_example

SELECT
    SchemaName = s.name
    ,o.type_desc
    ,ObjectName = o.name
    ,ColumnName = c.Name
FROM sys.objects AS o
LEFT JOIN sys.schemas AS s ON o.schema_id = s.schema_id
LEFT JOIN sys.all_columns AS c ON o.object_id = c.object_id
WHERE c.system_type_id IN (231, 167)
AND c.max_length = -1

--Other useful code to restrict what is returned
AND s.name IN ('schema_name') -- restrict to the schema
AND o.type_desc = 'VIEW' -- or USER_TABLE 
AND o.name LIKE '%table_name%' -- or use = 'precise_table_name'

Moving the socks

Unfortunately, now I can’t use the really straight forward script:


SELECT *
FROM schema.Table

and instead I have to list out all 100 columns and move those that are varchar(max) to the end. Still, this code saves lots of peering at the screen.

Happy bouncing blue and white sock

Figure 4: Happy bouncing blue and white sock

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Turner (2021, Feb. 13). Blog: Finding SQL varchar(max). Retrieved from https://philosopher-analyst.netlify.app/posts/2021-02-13-finding-sql-varcharmax/

BibTeX citation

@misc{turner2021finding,
  author = {Turner, Zoë},
  title = {Blog: Finding SQL varchar(max)},
  url = {https://philosopher-analyst.netlify.app/posts/2021-02-13-finding-sql-varcharmax/},
  year = {2021}
}