How to create linked server between On-premise and Azure SQL data warehouse

February 10, 2018, 2:16 am

≫ Next: SQL Server Export/Import excel using R script - Part 1

≪ Previous: How to reset azure sql admin password

In my previous post, l wrote about how to create linked server between On-premise and Azure SQL database.

In this post, let us see how to create linked server between On-premise and Azure SQL data warehouse and after establishing linked server, how we can query Azure SQL data warehouse tables from local On-premise SQL Server.

I have created Azure SQL data warehouse in my previous postand for this example I have created table "dwtable1" within azure sql dw.

Below are the steps to publish linked server connection between On-premise and azure sql dw.

From SSMS, connect to On-premise SQL Server. Goto Object Explorer -> Server Objects -> Linked servers

Linked Server - Linked server name. I have provided the linked server name as AZSQLDW.
Provider - SQL native client 11.0
Data source - Azure SQL Server detail
Catalog - Azure SQL data warehouse. I have created azure sql dw with name trnazsqldw

Provide Remote login & password (Azure SQL server authentication):

Now we can query the Azure SQL database from local On-premise SQL Server using four part naming convention as shown below:

SELECT * FROM Linkedservername.databasename.schemaname.tablename

--Select data in azure sql dw from on-premise sql server using four part naming convention
SELECT * FROM AZSQLDW.[trnazsqldw].dbo.dwtable1

--Insert data into azure sql dw from on-premise sql server using EXEC command
 EXEC ( 'INSERT [trnazsqldw].dbo.dwtable1 (Id)
         VALUES(1)' ) AT AZSQLDW;

--Update data in azure sql dw from on-premise sql server using four part naming convention
UPDATE AZSQLDW.[trnazsqldw].dbo.dwtable1 SET Id = 100
WHERE Id = 1

--Delete data in azure sql dw from on-premise sql server using four part naming convention
DELETE FROM AZSQLDW.[trnazsqldw].dbo.dwtable1
WHERE Id = 1

SQL Server Export/Import excel using R script - Part 1

December 22, 2018, 2:07 pm

≫ Next: SQL Server Export/Import excel using R script - Part 2

≪ Previous: How to create linked server between On-premise and Azure SQL data warehouse

In this post, let us see how we can easily import or export to excel with just few lines of r script executed from SSMS using sp_execute_external_script.

I have done below example using SQL Server 2016 on windows 10 (64 bit) and R (3.2.2).

Refer my previous post on how to install R services.

Also I have used "WideWorldImporters" sample database for this example.

Now let us jump on to the prerequisite steps for this approach. Though prerequisite steps are bit heavy, it is one time and thereafter enable us to export / import to excel easily with just few lines of code.

Prerequisite Step 1: Check the version of R by running sessionInfo() from R.exe

In my case, I have installed SQL Server 2016 so R.exe will be found in below path:

C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\bin

Prerequisite Step 2: From above step I got to know that it is 64 bit, So I have downloaded Java JDK for 64 bit (.exe file) from here & installed it.

C:\Program Files\Java\jdk-11.0.

Prerequisite Step 3: Now install rJava package from R.exe

install.packages("rJava")

Prerequisite Step 4: Now let us set the JAVA_HOME environment to Java JDK installed in Step 2 and load the rJava package to check if it working from R.exe.

Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")

library(rJava)

Prerequisite Step 5: Now it's time to install & load the xlsx package from R.exe.

· xlsx package is one of the powerful R packages to read, write and format Excel files.

· It is a java-based solution and it is available for Windows, Mac and Linux.

· It works for both Excel 2007 and Excel 97/2000/XP/2003 file formats (xls and xlsx file formats).

install.packages("xlsx")

library("xlsx")

Prerequisite Step 6: Copy & paste the packages downloaded from above steps into default R library to SQL Server R_services library.

I have copied xlsx,xlsxjars & rJava folders from C:\Users\Sathy\Documents\R\win-library\3.2

to C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\library

Now as part of last prerequisite step, from SSMS let us check the installed R packages. To execute R script using sp_execute_external script, refer to my previous post link provided in the beginning of this post.

sp_execute_external_script @language = N'R', @script = N'OutputDataSet <- data.frame(installed.packages())'

Now the script to export SQL Server table data into Excel:

Below script is pretty much simple and I have selected only first six key fields

from [WideWorldImporters].[Sales].[Orders] table and passed this as inputdataset (dataframe) to write.xlsx2 function.

USE [WideWorldImporters]

DECLARE @SQL NVARCHAR(MAX) = N'
SELECT [OrderID]
      ,[CustomerID]
      ,[SalespersonPersonID]
      ,[PickedByPersonID]
      ,[ContactPersonID]
      ,[BackorderOrderID]
  FROM [WideWorldImporters].[Sales].[Orders];'

DECLARE @RScript NVARCHAR(MAX) =N'
 Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
    library(rJava);
    library("xlsx");
    write.xlsx2(InputDataSet, file = "H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName="Order",
           col.names=TRUE,  append=FALSE) '

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript
     ,@input_data_1 = @SQL

73,595 records got exported in ~10 seconds. Note I have used "write.xlsx2" in above R script as it is faster than "write.xlsx".

More information on this topic, to be continued in part 2 ...

See Also:

SQL Server Export/Import excel using R script - Part 2

↧

SQL Server Export/Import excel using R script - Part 2

December 22, 2018, 2:07 pm

≫ Next: SQL Server Export/Import excel using R script - Part 3

≪ Previous: SQL Server Export/Import excel using R script - Part 1

In continuation to my previous post on SQL Server Export/Import excel using R script - Part 1, let us see few other examples in this post.

In previous post, I had selected only first six key fields from [WideWorldImporters].[Sales].[Orders] table and passed this as inputdataset (dataframe) to write.xlsx2 function.

Instead if I do "SELECT * FROM [WideWorldImporters].[Sales].[Orders]", I got error as shown in below screenshot:

This is because data type conversions is getting performed implicitly and some of the data types are not supported when data is passed between R libraries and SQL Server.

To workaround this, we can ignore the fields from exporting that has data types that are

not supported and also doesn't make any sense exporting (for example field with image data type)

To workaround the issue of data types that are poorly converted when data is passed between R and SQL Server, we can make use of TRY_CONVERT(VARCHAR(MAX),field name)

So in below example, I am going to export all the tables data (non temporal tables) that falls under "Sales" schema from WideWorldImporters sample database.

USE [WideWorldImporters]
GO
SET NOCOUNT ON
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList

SELECT  CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
  ,T.name TableName
  ,DENSE_RANK()OVER(ORDER BY T.name) TableCount
  ,TY.name DataType
INTO #ExportTablesList
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY 
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) = 'Sales'
AND temporal_type = 0 --NON_TEMPORAL_TABLE
AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported

SELECT TableName,MAX(TableCount) TableCount
    , STUFF(
        (
        SELECT ', ' + C.Columns
        From #ExportTablesList As C
        WHERE C.TableName = T.TableName
        FOR XML PATH('')
        ), 1, 2, '') AS Columns
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName

DECLARE @I INT = 1
       ,@TableName NVARCHAR(200)
       ,@SQL NVARCHAR(MAX) = N''
    ,@RScript NVARCHAR(MAX) = N''



WHILE @I <= (SELECT MAX(TableCount) FROM #FinalExportList)
BEGIN

SELECT @SQL = CONCAT('SELECT TOP 10 ',Columns,' FROM [WideWorldImporters].[Sales].',QUOTENAME(TableName),';')
      ,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I

SET @RScript = CONCAT('
 Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
    library(rJava);
    library("xlsx");
    write.xlsx2(InputDataSet, file = "H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName= "',@TableName,'",
           col.names=TRUE,  append=TRUE) ')

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript
     ,@input_data_1 = @SQL


SET @I = @I + 1
END

In above example, I have tried to export into single excel file under multiple sheets with sheet name as table name. This is achieved by setting argument "append=TRUE" in write.xlsx2 function.

While exporting multiple tables data, if we need any specific configurations to be done say like export top 100 records, order by key field or filter on particular field - all these can be done by tweaking the SQL that is built dynamically (like I have done in above example).

Similarly it can also be exported to different excel files instead of different tabs within single sheet with dynamic file names by tweaking the append & file arguments in write.xlsx2 function. Below script does this:

USE [WideWorldImporters]
GO
SET NOCOUNT ON
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList

SELECT  CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
  ,T.name TableName
  ,DENSE_RANK()OVER(ORDER BY T.name) TableCount
  ,TY.name DataType
INTO #ExportTablesList
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY 
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) = 'Sales'
AND temporal_type = 0 --NON_TEMPORAL_TABLE
AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported

SELECT TableName,MAX(TableCount) TableCount
    , STUFF(
        (
        SELECT ', ' + C.Columns
        From #ExportTablesList As C
        WHERE C.TableName = T.TableName
        FOR XML PATH('')
        ), 1, 2, '') AS Columns
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName

DECLARE @I INT = 1
       ,@TableName NVARCHAR(200)
       ,@SQL NVARCHAR(MAX) = N''
    ,@RScript NVARCHAR(MAX) = N''



WHILE @I <= (SELECT MAX(TableCount) FROM #FinalExportList)
BEGIN

SELECT @SQL = CONCAT('SELECT TOP 10 ',Columns,' FROM [WideWorldImporters].[Sales].',QUOTENAME(TableName),';')
      ,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I

SET @RScript = CONCAT('
 Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
    library(rJava);
    library("xlsx");
    write.xlsx2(InputDataSet, file = "H:/Temp/Blog/ImportExportExcel/',@TableName,'.xlsx", sheetName= "',@TableName,'",
           col.names=TRUE,  append=FALSE) ')

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript
     ,@input_data_1 = @SQL


SET @I = @I + 1
END

More information on this topic, to be continued in part 3 ...

Reference:

Data type mappings between R and SQL Server

↧

SQL Server Export/Import excel using R script - Part 3

December 23, 2018, 10:37 am

≫ Next: SQL Server 2019 Installation

≪ Previous: SQL Server Export/Import excel using R script - Part 2

In continuation to my previous posts on SQL Server Export/Import excel using R script - Part 1& Part 2, where we saw about exporting to Excel. In this post, let us see how to import Excel into SQL Server table using R script.

In part 2 post, I have exported few tables under "Sales" schema from "WideWorldImporters" sample database into single excel file with six sheets:

Sheet 1 - CustomerTransactions
Sheet 2 - InvoiceLines
Sheet 3 - Invoices
Sheet 4 - OrderLines
Sheet 5 - Orders
Sheet 6 - SpecialDeals

While importing we have two scenarios 1) Excel file structure is defined & known
2) Excel file structure is unknown and table has to be created dynamically

Let us see example for both scenarios:

Scenario 1: Excel file structure is defined & known

For this scenario, let us try to import sheet 1 by setting the argument sheetName = 1 in read.xlsx2 function. As we know the file structure, table has been created beforehand.

SET NOCOUNT ON
DROP TABLE IF EXISTS CustomerTransactions_Import

--As we know the Excel file structure beforehand, creating an empty table to import the data from Excel sheet
SELECT NULL AS RowId,*
INTO CustomerTransactions_Import
FROM [Sales].[CustomerTransactions]
WHERE 1 = 2

DECLARE @RScript NVARCHAR(MAX) =N'
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
require(rJava);
require("xlsx");
OutputDataSet <- read.xlsx2("H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName = 1,header=TRUE)
'
--Inserting from Excel
INSERT dbo.CustomerTransactions_Import
EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript

--SELECT * FROM CustomerTransactions_Import

--Below query compares the data in source table and data imported from Excel
SELECT Src.CustomerTransactionID,Imp.CustomerTransactionID
FROM [Sales].[CustomerTransactions] Src
LEFT JOIN CustomerTransactions_Import Imp
ON Src.CustomerTransactionID = Imp.CustomerTransactionID

Scenario 2: Excel file structure is unknown and table has to be created dynamically

For this scenario, let us try to import sheet 2 by setting the argument sheetName = 2 in read.xlsx2 function. As we don't know the file structure, table will be built dynamically using RevoScaleR function - rxDataStep and imported with data from read.xlsx2 function.

DECLARE @RScript NVARCHAR(MAX) =N'
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
require(rJava);
require("xlsx");
Output <- read.xlsx2("H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName = 2,header=TRUE)

sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=WideWorldImporters;Uid=sa;Pwd=***"
sqlTable <- "InvoiceLines_Import"
if (rxSqlServerTableExists(sqlTable, connectionString = sqlConnString)) rxSqlServerDropTable(sqlTable, connectionString = sqlConnString)
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)'

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript

sqlConnString variable needs to be set with ServerName, DatabaseName, Username & Password for ODBC SQL Server connection.
sqlTable variable is set with the name of table to be created.

More information on this topic, to be continued in part 4 ...

SQL Server 2019 Installation

December 25, 2018, 2:50 am

≫ Next: SQL Server Export/Import excel using R script - Part 4

≪ Previous: SQL Server Export/Import excel using R script - Part 3

In this post, let us see how to install SQL Server 2019 CTP 2.2 Evaluation Edition on Windows 10 (64 bit) machine.

Download SQL Server 2019 for windows from link provided in the bottom of this post:

After downloading, right click on the Setup file to Run as Administrator

Select the installation type, I did a Custom installation so that I can select the features to be installed:

Click on Hardware and Software requirements. MSDN site will open, check whether the basic requirements of your machine (like OS, disk space) is sufficient for installation.

Click on Installation - > New SQL Server stand-alone installation:

We can ignore the warnings:

Except Polybase Query Service, I have selected all the features.
As Polybase Query Service requires Oracle JRE which is not installed on my machine.

Provide the Instance name if there are any other version of SQL Server already installed on same machine or give the default instance.

Select the authentication type and Add Current User as SQL Server administrators:

SSIS Scale out configuration:

Accept the consent to install R & Python:

After clicking on Install, it took around 30 mins.

Some of the tools & services are not installed as part of above installation.
Open the Set up file -> Installation (select the tool or service to be installed)
(or) we can download it from web & install it separately.

I tried to install SSMS tool (by clicking on options shown in below screenshot):
MSDN site opened in IE with the download link.

What's new in SQL Server 2019

SQL Server 2019 can be downloaded from here.

Merry christmas!!

↧

SQL Server Export/Import excel using R script - Part 4

December 25, 2018, 8:42 am

≫ Next: SQL Server Export/Import excel using R script - Part 5

≪ Previous: SQL Server 2019 Installation

In continuation to my previous posts on SQL Server Export/Import excel using R script - Part 1 ,Part 2 & Part3 where we saw about exporting to Excel and importing to SQL Server table using "xlsx" package which is a powerful package but it has external dependency on Java ("rJava" package).

In this post, let us see how to import excel with multiple sheets using "readxl" package into different SQL Server tables.

I have done below example using SQL Server 2019 on windows 10 (64 bit) and R (3.4.4).

Refer my previous post on how to install R services.

I have also used Wideworldimporters sample database for data setup in this example.
Before trying to import excel with multiple sheets using "readxl" package into SQL Server tables, I tried to re-create the Excel file that was created using script provided in part 2

I got below error :

Error in .jnew("java/io/FileOutputStream", jFile) :

java.io.FileNotFoundException: (Access is denied)

This is because of Java language extension support in SQL Server 2019, So I had to provide folder access (read, write) to "ALL_APPLICATION_PACKAGES".

Step 1:

Now let us install the required packages for this example:

Open R.exe from below path:

C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin

From R.exe, execute install.packages("readxl")

Step 2:

After successful installation of above package, Copy & paste all the packages downloaded from above step into default R library to SQL Server R_services library.

From C:\Users\Sathy\Documents\R\win-library\3.4

to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library

Step 3:

Below script will import Excel file with six sheets into six different table on database called "Staging" that I have created before executing this script

Sheet 1 - CustomerTransactions
Sheet 2 - InvoiceLines
Sheet 3 - Invoices
Sheet 4 - OrderLines
Sheet 5 - Orders
Sheet 6 - SpecialDeals

DECLARE @RScript NVARCHAR(MAX) =N'
library(readxl);
library(foreach);
filepath_var <- "H:/Temp/Blog/ImportExportExcel/Sales.xlsx"
tab_names <- excel_sheets(path = filepath_var)
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

foreach(n = 1:length(tab_names)) %do%
{
sqlTable <- tab_names[n]
Output <- read_excel(path = filepath_var, sheet = n)
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
}'

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript

I have updated above script to make sure script doesn't error out if the file does not exist in the path or if any of the sheet is empty. Below is the updated script:

To test the script I have opened the "Sales.xlsx" file and added an empty sheet and saved it.

DECLARE @RScript NVARCHAR(MAX) =N'
library(readxl);
library(foreach);

filepath_var <- "H:/Temp/Blog/ImportExportExcel/Sales.xlsx"
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

#Check file exists in path
if (file.exists(filepath_var)) {

tab_names <- excel_sheets(path = filepath_var)

foreach(n = 1:length(tab_names)) %do%
{
sqlTable <- tab_names[n]
Output <- read_excel(path = filepath_var, sheet = n)

#check if the sheet has header
if (dim(Output)[2] != 0) {

sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
}
}
}'

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript

dim(Output)[1] -> gives us no.of rows

dim(Output)[2] -> gives us no.of columns

In this example, if there are no columns I am assuming the sheet is empty and not doing any further processing.

More information on this topic, to be continued in part 5 ...

SQL Server Export/Import excel using R script - Part 5

December 25, 2018, 2:43 pm

≫ Next: SQL Server Export/Import excel using R script - Part 6

≪ Previous: SQL Server Export/Import excel using R script - Part 4

In continuation to my previous posts on SQL Server Export/Import excel using R script Part 1, Part 2, Part 3& Part 4. In this post, let us see how to import multiple excel files in a folder (including sub folders) into different SQL Server table created dynamically based on excel file structure.

For this example, I am going to make use of excel files generated using second script in Part 2 post.
Also I have created sub folder named "Copy" and copied the same set of files into it, I have done this to ensure script has traversed through all the folders and works fine.

Also to test if the script can ignore the empty sheets, I have manually opened 1 or 2 excel files added an empty sheet and saved it.

In this example, SQL Server table name is made as file name + sheet name and removed all special characters and spaces if any from file/sheet name to make sure SQL Server table name is unique and as per MSSQL standard.

As you can see in above screenshot, table names with Copy as prefix are imported from sub folders.

Below is the script that iterates through multiple files with multiple sheets and create them as separate table in SQL Server:

DECLARE @RScript NVARCHAR(MAX) =N'
library(readxl);
library(foreach);
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

list_filespath <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.xlsx", all.files = FALSE,
full.names = TRUE, recursive = TRUE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)

list_filenames <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.xlsx", all.files = FALSE,
full.names = FALSE, recursive = TRUE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)

#Check file exists in path
if (length(list_filespath) != 0) {

foreach(m = 1:length(list_filespath)) %do%
{
filepath_var <- list_filespath[m]
tab_names <- excel_sheets(path = filepath_var)

foreach(n = 1:length(tab_names)) %do%
{
sqlTable <- paste(strsplit(list_filenames[m],split = ".xlsx"),tab_names[n],sep = "_")
sqlTable <- gsub("\\s", "", sqlTable)
sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)
Output <- read_excel(path = filepath_var, sheet = n)

#check if the sheet has header
if (dim(Output)[2] != 0) {

sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
}
}
}
}'

EXEC sp_execute_external_script
@language = N'R'

,@script = @RScript

More information on this topic, to be continued in part 6 ..

SQL Server Export/Import excel using R script - Part 6

December 30, 2018, 10:38 am

≫ Next: SQL Server Export/Import excel using R script

≪ Previous: SQL Server Export/Import excel using R script - Part 5

In continuation to my previous posts on SQL Server Export/Import excel using R script Part 1, Part 2, Part 3, Part 4& Part 5. In this post, let us see how to invoke sql files in a folder and export the output of the query as Excel files.

As a prerequisite, we need R packages "xlsx" & "rJava", which I had showed in my previous posts on how to install and load them.

As we are going to make use of the powershell SQL Server cmdlets "Invoke-Sqlcmd", let us make sure SQL Server powershell modules are downloaded & installed from Powershell.

For this example, I have created below folder structure:

Temp folder - Is a placeholder for staging files created in the intermediate process
Script folder - Has the R script that runs this entire process
BusinessReports folder - Where excel reports are exported
SQL folder - Contains the SQL files to be executed

For this example, I have placed three .sql files in a folder and the sql inside the file are as shown in below screenshot. All three sql queries are selecting data from three different tables in "WideWorldImporters" sample database.

Explanation on how this approach works:

R invokes the powershell command using "system2" function. Powershell command then reads all the sql files (using Invoke-Sqlcmd) and writes the sql result to a text file (just as a staging). Only one staging file will be created for entire process of exporting SQL files output to Excel and will be deleted at the end of the process. Using the data in staging text file on each iteration, an excel is created using "xlsx" r package.

Below R script does everything what I have mentioned in above explanation:

I have saved the above script in Script folder, it can be either manually opened & executed from any R environment (R studio etc.) or this script can be invoked from command line.

Below screenshot shows on how to invoke R script from command Prompt by using Rscript.exe in SQL Server folder:

Goto Run -> cmd ->

This command line can also be created as a SQL agent job and scheduled as per requirement.

CD "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin"
Rscript.exe "H:\Temp\Blog\ImportExportExcel\Script\SQLExportExcel.R"

After executing the R script from Command Prompt or R studio:
We can see the output of SQL files in Excel format within a directory created with date and Excel file & sheet name is SQL file name

Executing the R script on same day, just overwrites the directory & excel files without error and a separate directory is created when the script is executed on the next day.

I tested this scenario with the above script, where if one of the SQL file has incorrect syntax, script still works for all the other SQL files and runs the entire process.

Using password argument in write.xlsx2 function, the exported excel files can be password protected. But unfortunately writing of password protected workbooks is supported for Excel 2007 OOXML format only.

Below is the complete code block:

library(foreach);
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava);
library("xlsx");
#getting list of sql files & file names from folder
list_filespath <- list.files(path = "H:/Temp/Blog/ImportExportExcel/SQL", pattern = ".sql", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
list_filenames <- list.files(path = "H:/Temp/Blog/ImportExportExcel/SQL", pattern = ".sql", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
#Check file exists in path
if (length(list_filespath) != 0) {
#creating new directory on each day for SQL output
newdir <- paste("H:/Temp/Blog/ImportExportExcel/BusinessReports/Report", Sys.Date(), sep = "_")
dir.create(newdir)
foreach(m = 1:length(list_filespath)) %do%
{
#forming the powershell command to invoke sql files
command1 <- "Invoke-Sqlcmd -ServerInstance 'LAPTOP-LCUB6HBB' -Database WideWorldImporters -InputFile "
command3 <- "| Export-Csv 'H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt'"
command <- paste(command1,list_filespath[m],command3, sep="")
system2("powershell", args = command)
#powershell Invoke-sqlcmd output to staging file in .txt format
StagingOut <- read.table('H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt', skip = 1, header =TRUE, sep =',')
#create excel output using .txt staging file
ExcelFilename <- paste(newdir,"/",gsub(".sql", "", list_filenames[m]),".xlsx", sep="")
write.xlsx2(StagingOut, file = ExcelFilename, sheetName=list_filenames[m],
col.names=TRUE, append=FALSE, password= "Password123")
}
}
#Delete staging file (intermediate file before exporting to excel) if it exist
if (file.exists('H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt'))
{
file.remove('H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt')
}

SQL Server Export/Import excel using R script

December 30, 2018, 10:30 pm

≫ Next: SQL Server export adhoc SQL or table to CSV file

≪ Previous: SQL Server Export/Import excel using R script - Part 6

In this post, let us see new approach to import excel into SQL Server and export SQL server data to excel. There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.

BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly
BCP - Work around has to be done to include the header
SSIS - With dynamic source & destination, handling mapping increases the complexity of the package
SQLCMD - Cannot export output in Excel

R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database and Java language extension support in SQL Server 2019, this new approach can be used extensively as it easy, fast and flexible.

Below series of posts has some examples on how to install various R packages (xlsx - read/write excel & readxl - used for reading excel).

Also I have showed examples on:

i) how to export SQL Server data into excel file with multiple tabs with table names as sheet names

ii) how to export SQL Server data into multiple excel files with table name as file & sheet names

iii) how to import excel with multiple sheets into different SQL Server table by creating target schema dynamically with sheet name as table name

iv) how to import multiple excel files in a directory (including sub folders) with multiple sheets into different SQL Server table by creating target schema dynamically with file name + sheet name as table name

v) how to export output of SQL files stored in a folder to excel files

SQL Server Export/Import excel using R script - Part 1
SQL Server Export/Import excel using R script - Part 2
SQL Server Export/Import excel using R script - Part 3
SQL Server Export/Import excel using R script - Part 4

SQL Server Export/Import excel using R script - Part 5
SQL Server Export/Import excel using R script - Part 6

Above examples are done using the base R packages & functions and very few are explicitly installed to achieve some excel operations.

With respect to moving data (Import / Export) between SQL Server & Excel, there are various scenarios based on each requirement. I have covered some of them but tweaking the solutions mentioned above can cover any scenario.

I'll keep updating this article with different examples. If anyone has specific requirement or any issues trying above examples, please comment.

SQL Server export adhoc SQL or table to CSV file

December 31, 2018, 1:41 pm

≫ Next: Import csv or text file into SQL Server table

≪ Previous: SQL Server Export/Import excel using R script

In this post, let us see how to export adhoc sql query output or list of tables, views data into csv file. I am going to make use of R script to do this.

I have tried this example with SQL Server 2019 & WideWorldImporters sample database.

For this approach, I have created a scalar function and Stored procedure.

Scalar function - returns a unique table name for a session. This function is required only if adhoc sql query has to be exported to csv file.

Stored procedure - Has below three input parameters and writes the output to csv file
i) @ExportPath - path for exporting csv file
ii) @ObjectlisttoExport - list of tables, views to be exported
iii) @Objectlistdelim - delimiter for above object list

This script will work from SQL Server 2017 and above. Actually it can be made to work from SQL Server 2016 & above - by modifying STRING_AGG that has been used inside the stored procedure to combine the list of columns into comma separated string. This approach will not work below SQL Server 2016 version as execution of R language using T-SQL was introduced in SQL Server 2016.

Scalar function:

USE [WideWorldImporters]
GO

CREATE OR ALTER FUNCTION fngetcsvtblname() 
RETURNS NVARCHAR(128)  
AS  
BEGIN  
     DECLARE @tblname NVARCHAR(128)     
     SELECT @tblname = CONCAT('ExportCSV_',@@SPID,DB_ID(),'_temptbl') 
  RETURN @tblname
END;  
GO

Stored procedure:

CREATE OR ALTER PROC usp_ExportCSV (@ExportPath NVARCHAR(MAX),
                                    @ObjectlisttoExport NVARCHAR(MAX),
         @Objectlistdelim CHAR(1)
         )
AS
BEGIN
SET NOCOUNT ON;

IF ISNULL(@ExportPath,'') <> ''
 BEGIN

 SELECT @ExportPath = REPLACE(@ExportPath,'\','/')

 DECLARE @tblname NVARCHAR(128)
 SELECT @tblname= dbo.fngetcsvtblname() 

 IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname) OR (ISNULL(@ObjectlisttoExport,'') <> '' AND ISNULL(@Objectlistdelim,'') <> '')
 BEGIN


  DECLARE @TableColList TABLE (Cols NVARCHAR(MAX),Tbl NVARCHAR(128))

  IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname)
  BEGIN
  INSERT @TableColList
  SELECT  CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
      ,CONCAT(S.name,'.',O.name) TableName
  FROM sys.objects O
  JOIN sys.schemas S
  ON S.schema_id = O.schema_id
  JOIN sys.columns C
  ON O.object_id = C.object_id
  JOIN sys.types TY 
  ON C.[user_type_id] = TY.[user_type_id]
  WHERE CONCAT(S.name,'.',O.name) = (SELECT CONCAT(schema_name(schema_id),'.',name ) FROM sys.tables WHERE name = @tblname)
  AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported
  AND O.type IN ('U','V')
  END
  IF ISNULL(@ObjectlisttoExport,'') <> ''
  BEGIN
  INSERT @TableColList
  SELECT  CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
      ,CONCAT(S.name,'.',O.name) TableName
  FROM sys.objects O
  JOIN sys.schemas S
  ON S.schema_id = O.schema_id
  JOIN sys.columns C
  ON O.object_id = C.object_id
  JOIN sys.types TY 
  ON C.[user_type_id] = TY.[user_type_id]
  WHERE CONCAT(S.name,'.',O.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, @Objectlistdelim))
  AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported
  AND O.type IN ('U','V')
  END

  DROP TABLE IF EXISTS #ExportTablesList
  SELECT  IDENTITY(INT,1,1) Rn
  ,tbl
  ,STRING_AGG(cols,',') AS cols
  INTO #ExportTablesList
  FROM @TableColList
  GROUP BY tbl 

  DECLARE @I INT = 1
      ,@SQL NVARCHAR(MAX) = N''
      ,@RScript NVARCHAR(MAX) = N''
      ,@filename NVARCHAR(MAX) = N''

  WHILE @I <= (SELECT MAX(Rn) FROM #ExportTablesList)
  BEGIN
                --just for testing selecting top 10, this can be removed
    SELECT @SQL = CONCAT('SELECT TOP 10',Cols,' FROM ',tbl,';')
       ,@tblname = Tbl
    FROM #ExportTablesList WHERE Rn = @I


    SELECT @tblname = REPLACE(@tblname,'.','_')
    SELECT @filename = CASE WHEN SUBSTRING (@ExportPath,LEN(@ExportPath),1) = '/'  THEN  CONCAT(@ExportPath,@tblname,'.csv')
          ELSE CONCAT(@ExportPath,'/',@tblname,'.csv') END

    SET @RScript = CONCAT('write.csv(InputDataSet, file = "',@filename,'")')

    EXEC   sp_execute_external_script
       @language = N'R'
      ,@script = @RScript
      ,@input_data_1 = @SQL

    SET @tblname = ''
    SET @filename = ''
    SET @I = @I + 1
  END

  --After successful processing, dropping the table created for exporting adhoc sql into csv 
  SELECT @tblname= dbo.fngetcsvtblname() 
  IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname) 
  BEGIN
  EXEC('DROP TABLE '+@tblname)
  END

  END ELSE PRINT 'No object specified for exporting to CSV or Objectlist params are empty'

  END ELSE PRINT 'Export folder path need to be mentioned'

END;
GO

Let us see the different execution results:

1) When all the parameters are passed blank

2) When folder path is passed but no objects specified for exporting

3) Notice in the below screenshot, I am getting a unique table name using scalar function and inserting my adhoc sql output into that table and then calling the stored procedure.

This stored procedure exports the adhoc sql output to csv file in the mentioned path and drops the table created after successful processing.

4) I am passing the list of tables from Sales schema in WideWorldImporters database.
This stored procedure exports the list of tables data into separate csv files with table name as file name in the mentioned path.

5) This is to show that Stored procedure can export adhoc sql output, list of tables, views data into separate csv files simultaneously.

Sample execution code block:

SELECT dbo.fngetcsvtblname() --function to get unique table name for inserting adhoc sql into a table
DECLARE @tblname NVARCHAR(128) = dbo.fngetcsvtblname() 
  IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname) 
  BEGIN
  EXEC('DROP TABLE '+@tblname)
  END

/** place your adhoc sql to be exported into csv and SELECT INTO tablename provided by above function **/
SET NOCOUNT ON;
SELECT TOP 5 OL.OrderLineID,O.*
INTO ExportCSV_726_temptbl
FROM [WideWorldImporters].[Sales].[Orders] O
JOIN [Sales].[OrderLines] OL
ON O.OrderID = OL.OrderID

/** place your adhoc sql to be exported into csv and SELECT INTO tablename provided by above function **/

EXEC usp_ExportCSV 'H:\Temp\Blog\ImportExportExcel','SAles.Orders|Sales.Invoices|Website.Customers','|'

Import csv or text file into SQL Server table

January 6, 2019, 7:49 am

≫ Next: T-SQL : Search for string or phrase in SQL Server database

≪ Previous: SQL Server export adhoc SQL or table to CSV file

In this post, let us see how to import csv or text file into SQL Server table. If you want to import some random files on adhoc basis then we can make use ofImport Flat File Wizard from SQL Server Management Studio.

If the requirement is to do on regular basis or import multiple files simultaneously then we can do this using t-sql script that executes R script to read an csv or text file & load into SQL Server table.

In my previous post on SQL Server export adhoc SQL or table to CSV file, I have exported few csv files. I am going to make use of those three csv files for this example.

I have also created a database called "Staging" to load the csv files into different SQL tables created dynamically with csv filename as table name.

Below screenshot shows executing the below script imports the csv files as SQL Server tables under staging database.

DECLARE @RScript NVARCHAR(MAX) =N'
library(foreach);
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

list_filespath <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.csv", all.files = FALSE,
full.names = TRUE, recursive = TRUE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
list_filenames <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.csv", all.files = FALSE,
full.names = FALSE, recursive = TRUE,
ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)

#Check file exists in path
if (length(list_filespath) != 0) {

foreach(m = 1:length(list_filespath)) %do%
{

sqlTable <- strsplit(list_filenames[m],split = ".csv")
sqlTable <- gsub("\\s", "", sqlTable)
sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)

Output <- read.csv(file=list_filespath[m], header=TRUE, sep=",")
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)

}
}'

EXEC sp_execute_external_script
@language = N'R'

,@script = @RScript

T-SQL : Search for string or phrase in SQL Server database

May 3, 2019, 1:47 pm

≫ Next: T-SQL : Data profiling in On-premise SQL Server / Azure SQL database

≪ Previous: Import csv or text file into SQL Server table

In this post, let us see how to search for a string / phrase in SQL Server database using hybrid solution of T-SQL LIKE operator & R grep function. Currently the options that exists in SQL Server to perform a search operation are

LIKE operator
Using string functions CHARINDEX, PATINDEX
Full text search

Consider below example: To search and return only records with string "VAT" . Expected result is to return record 1,5 & 6.

DECLARE @Tmp TABLE (Id INT, Descrip VARCHAR(500))

INSERT @Tmp SELECT 1,'my VAT calculation is incorrect'

INSERT @Tmp SELECT 2,'Private number'

INSERT @Tmp SELECT 3,'Innnovation model'

INSERT @Tmp SELECT 4,'ELEVATE'

INSERT @Tmp SELECT 5,'total VAT'

INSERT @Tmp SELECT 6,'VAT'

SELECT * FROM @Tmp WHERE Descrip LIKE 'VAT'

SELECT * FROM @Tmp WHERE Descrip LIKE '%VAT'

SELECT * FROM @Tmp WHERE Descrip LIKE '%VAT%'

SELECT * FROM @Tmp WHERE Descrip LIKE '% VAT %'

SELECT * FROM @Tmp WHERE Descrip LIKE '% VAT'

As shown in above example, to do an exact search on string, there is no straight forward option using first two options mentioned above. However though it is possible with third option using Full text CONTAINS predicate. Full text catalog, unique index & full text index has to be created on the table on which search operation needs to be performed.

If the exact search of string needs to be performed on the entire database then creating full text catalog, unique index & full text index on each and every table won’t be a viable option.

With the hybrid approach [T-SQL LIKE operator & R grep function], let us see various search types that can be performed [Pattern Search, Exact Search, Multi pattern search and other search scenario's – based on collation, case sensitive/insensitive search and complex wildcard search].

Pre-requisites

We have used SQL Server 2019 evaluation edition on Windows 10 64 bit and WideWorldImporters SQL Server sample database for this example. In this example, we have made use of R services installed as part of SQL Server. Install R services and then from SSMS enable the external scripting feature. Restart the database engine and then verify the installation as mentioned in MSDN.

Below script / this approach will work starting from SQL Server 2016 and above (as execution of R language using T-SQL was introduced in SQL Server 2016). Also please note, no additional R packages need to be installed for this approach.

Explanation on how this approach works

A stored procedure named "usp_SearchString" has been created. This stored procedure has the capability to do normal T-SQL LIKE operations as well as can search string using R grep function and this can be controlled through input parameter.

Output of the search operation will be stored in a table named "Tbl_SearchString". Also output will be displayed at the end of stored procedure execution.

Below are the various input parameters of stored procedure and it’s usage details:

Parameter	Description
@SearchString	String to be searched
@SearchType	ES - Exact Search using R PS - Pattern Search using R MPS - Multi Pattern Search (OR condition) using R NTLS - Normal T-SQL Like Search
@Match	0 = LIKE Search, 1 = NOT LIKE Search
@IgnoreCase	1 = case insensitive search, 0 = Case sensitive search (If @IgnoreCase IS NULL then default : case insensitive search)
@SearchSQLMetadata	Search sql definitions for presence of input string. 1 = Search, 0 = Don't Search
@SchemaName	List of objects to be searched that fall under schema (Multiple schema's can be passed, separated by Comma)
@ObjectlisttoSearch	List of objects to be searched (Multiple tables can be passed, separated by Comma)
@SearchCollate	For @SearchType = NTLS if @IgnoreCase = 0. To search based on particular collation, default - COLLATE Latin1_General_CS_AS

If both @ObjectlisttoSearch & @SchemaName are blank then entire database is searched including SQL object definitions

Please note:

@SearchType = ES, PS, MPS uses R script to search string

@SearchType = NTLS does the normal T-SQL LIKE search

Multi string search should always be delimited by pipe.

@ObjectlisttoSearch, @SchemaName should always be delimited by comma if multiple values specified.

Script

USE [WideWorldImporters]

GO

--Note : Before compiling this SP, search for sqlConnString and provide Databasename, username & password for R SQL connection

CREATE OR ALTER PROC usp_SearchString ( @SearchString NVARCHAR(MAX),

@SearchType VARCHAR(4),

@Match BIT,

@IgnoreCase BIT,

@SearchSQLMetadata CHAR(1),

@SchemaName NVARCHAR(50),

@ObjectlisttoSearch NVARCHAR(MAX),

@SearchCollate NVARCHAR(500)

)

/*************************************************************************

=================

INPUT PARAMETERS:

=================

@SearchString - String to be searched

@SearchType - ES - Exact Search using R

PS - Pattern Search using R

MPS - Multi Pattern Search - OR condition using R

NTLS - Normal T-SQL Like Search

@Match - 0 = LIKE Search, 1 = NOT LIKE Search

@IgnoreCase - 1 = case insensitive search, 0 = Case sensitive search (If @IgnoreCase IS NULL then default : case insensitive search)

@SearchSQLMetadata - Search sql definitions for presence of input string. 1 = Search, 0 = Don't Search

@SchemaName  - List of objects to be searched that fall under schema (Multiple schema's can be passed, separated by Comma)

@ObjectlisttoSearch - List of objects to be searched (Multiple table's can be passed, separated by Comma)

--IF BOTH @ObjectlisttoSearch & @SchemaName ARE BLANK THEN ENTIRE DATABASE IS SEARCHED INCLUDING SQL DEFINITIONS

@SearchCollate - For @SearchType = NTLS if @IgnoreCase  = 0. To search based on particular collation, default - COLLATE Latin1_General_CS_AS

*****************************************************************************/

AS

BEGIN

SET NOCOUNT ON;

IF @SearchType IN ('ES','PS','MPS','NTLS')

BEGIN

DECLARE @ExecutedBy NVARCHAR(200) = CURRENT_USER

DECLARE @Serv NVARCHAR(200) = CONCAT(CHAR(39),CHAR(39),@@SERVERNAME,CHAR(39),CHAR(39))

IF ISNULL(@SchemaName,'') <> '' OR ISNULL(@ObjectlisttoSearch,'') <> ''

BEGIN

/**** List of table columns to be searched ****/

DECLARE @TableColList TABLE (Cols NVARCHAR(MAX),colname NVARCHAR(200),Tbl NVARCHAR(128),TblCol NVARCHAR(100),ColType NVARCHAR(150))

INSERT @TableColList

SELECT

CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')

THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))

ELSE C.name END Columns -- To cover poor data type conversions when passed to R dataframe

,C.name

,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name,'.',C.name) TblCol

,TY.name

FROM Sys.tables T

JOIN sys.columns C

ON T.object_id = C.object_id

JOIN sys.types TY

ON C.[user_type_id] = TY.[user_type_id]

-- Ignore the datatypes that are not required

WHERE TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')

AND (Schema_name(T.schema_id) IN (SELECT value FROM STRING_SPLIT(@SchemaName, ','))

OR CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoSearch, ',')))

END ELSE

BEGIN

INSERT @TableColList

SELECT

CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')

THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))

ELSE C.name END Columns -- To cover poor data type conversions when passed to R dataframe

,C.name

,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name,'.',C.name) TblCol

,TY.name

FROM Sys.tables T

JOIN sys.columns C

ON T.object_id = C.object_id

JOIN sys.types TY

ON C.[user_type_id] = TY.[user_type_id]

-- Ignore the datatypes that are not required

WHERE TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')

END

DROP TABLE IF EXISTS #ExportTablesList

CREATE TABLE #ExportTablesList (Rn BIGINT IDENTITY(1,1),cols NVARCHAR(500),colname NVARCHAR(200),tbl NVARCHAR(200),ColType NVARCHAR(200))

IF @SearchSQLMetadata = 1 OR (@SearchSQLMetadata <> 0 AND (ISNULL(@SchemaName,'') = '' AND ISNULL(@ObjectlisttoSearch,'') = ''))

BEGIN

INSERT #ExportTablesList (cols,tbl,ColType) SELECT'CONCAT(''<'',object_schema_name(sm.object_id),''.'',object_name(sm.object_id),''|'',o.type_desc COLLATE Latin1_General_100_CI_AS,''>'',sm.definition) AS definition'

,'sys.sql_modules AS sm JOIN sys.objects AS o ON sm.object_id = o.object_id'

,'sql_modules'

END

--Deduplication of object list

;WITH dedup

AS

(

SELECT *,ROW_NUMBER()OVER(PARTITION BY Tbl,Cols ORDER BY Cols) Rn FROM @TableColList

)

INSERT INTO #ExportTablesList

SELECT cols,colname,tbl,ColType FROM dedup

WHERE Rn = 1

AND tbl <> 'dbo.Tbl_SearchString'

/**** List of table columns to be searched ****/

IF (SELECT COUNT(1) FROM #ExportTablesList) <> 0

BEGIN

--Table to hold search output

IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Tbl_SearchString')

BEGIN

CREATE TABLE [dbo].[Tbl_SearchString] (

[RunId] FLOAT,

[SearchIndex] BIGINT,

[SearchValue] NVARCHAR(MAX),

[NoOfOccurance] FLOAT,

[ObjectName] NVARCHAR(200),

[ColumnNameORDefinition] NVARCHAR(200),

[SqlDatatype] NVARCHAR(200),

[InputParameter] NVARCHAR(800),

[ExecutedBy] NVARCHAR(200),

[ExecutedAt] DATETIME

)

END

DECLARE @RunId FLOAT

SELECT @RunId = COALESCE(MAX([RunId]),0)+1 FROM [dbo].[Tbl_SearchString]

--Processing to store input parameters

DECLARE @Input NVARCHAR(MAX) = CONCAT(

'@SearchString > ' ,CASE WHEN @SearchString = '' OR @SearchString IS NULL THEN 'NULL' ELSE @SearchString END

,',@SearchType > ' ,CASE WHEN @SearchType = '' OR @SearchType IS NULL THEN 'NULL' ELSE @SearchType END

,',@Match > ' ,COALESCE(@Match,0)

,',@IgnoreCase > ' ,COALESCE(@IgnoreCase,1)

,',@SearchSQLMetadata > ' ,CASE WHEN @SearchSQLMetadata = '' OR @SearchSQLMetadata IS NULL THEN 'NULL' ELSE @SearchSQLMetadata END

,',@SchemaName > ' ,CASE WHEN @SchemaName = '' OR @SchemaName IS NULL THEN 'NULL' ELSE @SchemaName END

,',@ObjectlisttoSearch > ' ,CASE WHEN @ObjectlisttoSearch = '' OR @ObjectlisttoSearch IS NULL THEN 'NULL' ELSE @ObjectlisttoSearch END)

--By default case insensitive search

SELECT @IgnoreCase = COALESCE(@IgnoreCase,1)

--By default LIKE search

SELECT @Match = COALESCE(@Match,0)

IF @SearchType = 'NTLS'

BEGIN

DECLARE @SearchStrings TABLE (Id INT IDENTITY(1,1),String NVARCHAR(MAX))

INSERT @SearchStrings

SELECT value FROM STRING_SPLIT(@SearchString, '|')

UPDATE #ExportTablesList SET Tbl = 'sys.sql_modules', colname = 'definition'

WHERE ColType = 'sql_modules'

SET @SearchCollate = CASE WHEN @SearchCollate = '' THEN NULL ELSE @SearchCollate END

DECLARE @COLLATE NVARCHAR(100)

SET @COLLATE = CASE WHEN @IgnoreCase = 0 THEN CASE WHEN @SearchCollate = '' OR @SearchCollate IS NULL THEN ' COLLATE Latin1_General_CS_AS '

ELSE CONCAT(' COLLATE ',@SearchCollate,'') END

ELSE CHAR(32) END

DECLARE @SearchOperator NVARCHAR(100)

SET @SearchOperator = CASE WHEN @Match = 1 THEN ' NOT LIKE ' ELSE ' LIKE ' END

DECLARE @WHEREClause NVARCHAR(MAX)

;WITH CTE

AS

(

SELECT 'SearchValue '+ @SearchOperator +''''+String+''''+@COLLATE WhereClause FROM @SearchStrings

)

SELECT @WHEREClause = STUFF(

(SELECT ' OR ' + WhereClause FROM

(SELECT WhereClause FROM CTE ) AS T FOR XML PATH('')),2,2,'')

END

SET @SearchString = CASE WHEN @SearchType = 'ES' THEN REPLACE(@SearchString,'"','') ELSE @SearchString END

/**** Loop through above Objects list and execute R script ****/

DECLARE @I INT = 1

,@SQL NVARCHAR(MAX) = N''

,@RScript NVARCHAR(MAX) = N''

,@tblname NVARCHAR(128)

,@Colname NVARCHAR(200)

,@Sqltype NVARCHAR(100)

WHILE @I <= (SELECT MAX(Rn) FROM #ExportTablesList)

BEGIN

SELECT @SQL = CONCAT('SELECT ',Cols,' FROM ',tbl)

,@tblname = Tbl

,@Colname = CASE WHEN @SearchType IN ('ES','PS') THEN cols ELSE colname END

,@Sqltype = ColType

FROM #ExportTablesList WHERE Rn = @I

IF @SearchType IN ('ES','PS','MPS')

BEGIN

SET @RScript = '

#Provide DB credential detail for storing output in a table

sqlConnString <- "Driver=SQL Server;Server=serv; Database=WideWorldImporters;Uid=sa;Pwd=password"

#function to count no of occurences

countCharOccurrences <- function(char,string,Type) {

if (Type =="ES")

{

Boundchar <- paste0("\\b",char,"\\b",sep ="")

string1 <- gsub(Boundchar,"",string,ignore.case=IgnoreCase)

}

string1 <- gsub(char,"",string,ignore.case=IgnoreCase)

return ((nchar(string) - nchar(string1))/nchar(char))

}

#getting input dataset column name into a variable "c"

c <- colnames(InputDataSet)

if (SearchType == "ES")

{

ExactString <- paste0("\\b",SearchString,"\\b",sep ="")

Output <- as.data.frame(grep(ExactString,InputDataSet[[c]],ignore.case = IgnoreCase,invert = Match))

colnames(Output)[1] <- "SearchIndex"

Output$SearchValue <- grep(ExactString,InputDataSet[[c]],ignore.case = IgnoreCase,value = TRUE,invert = Match)

Output$NoOfOccurance <- countCharOccurrences(SearchString,Output$SearchValue,SearchType)

}

if (SearchType == "PS" || SearchType == "MPS")

{

Output <- as.data.frame(grep(SearchString,InputDataSet[[c]],ignore.case = IgnoreCase,invert = Match))

colnames(Output)[1] <- "SearchIndex"

Output$SearchValue <- grep(SearchString,InputDataSet[[c]],ignore.case = IgnoreCase,value = TRUE,invert = Match)

if (SearchType == "PS") {

Output$NoOfOccurance <- countCharOccurrences(SearchString,Output$SearchValue,SearchType) }

}

Output$ObjectName <- rep(tblname,nrow(Output))

Output$ColumnNameORDefinition <- rep(c,nrow(Output))

Output$SqlDatatype <- rep(Sqltype,nrow(Output))

Output$ObjectName[Output$SqlDatatype == "sql_modules"] <- "sql_modules"

Output$InputParameter <- rep(Input,nrow(Output))

Output$ExecutedBy <- rep(ExecutedBy,nrow(Output))

Output$ExecutedAt <- rep(format(Sys.time(),usetz = FALSE),nrow(Output))

Output$RunId <- rep(RunId,nrow(Output))

sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = "Tbl_SearchString")

rxDataStep(inData = Output, outFile = sqlDS,append = "rows")

'

EXEC sp_execute_external_script

@language = N'R'

,@script = @RScript

,@input_data_1 = @SQL

,@params = N'@SearchString NVARCHAR(MAX),@SearchType VARCHAR(4),@Match BIT,@IgnoreCase BIT,@Input NVARCHAR(MAX)

,@tblname NVARCHAR(128),@Sqltype NVARCHAR(150),@ExecutedBy NVARCHAR(200),@RunId FLOAT

,@Serv NVARCHAR(200)'

,@SearchString = @SearchString

,@SearchType = @SearchType

,@Match = @Match

,@IgnoreCase = @IgnoreCase

,@Input = @Input

,@tblname = @tblname

,@Sqltype = @Sqltype

,@ExecutedBy = @ExecutedBy

,@RunId = @RunId

,@Serv = @Serv

END

IF @SearchType = 'NTLS'

BEGIN

INSERT [dbo].[Tbl_SearchString]([RunId],[SearchIndex],[SearchValue],[ObjectName]

,[ColumnNameORDefinition],[SqlDatatype],[InputParameter],[ExecutedBy],[ExecutedAt])

EXEC ('SELECT '+@RunId+',SearchIndex,SearchValue,'''+@tblname+''','''+@Colname+''','''+@Sqltype+''','''+@Input+''','''+@ExecutedBy+''', GETDATE()

FROM (SELECT ROW_NUMBER()OVER(ORDER BY (SELECT 1)) SearchIndex,'+@Colname+' AS SearchValue FROM '+@tblname+

' ) Tmp WHERE '+@WHEREClause)

END

SET @I = @I + 1

END

/**** Loop through above table list and execute R script ****/

--Display final search result

SELECT * FROM [dbo].[Tbl_SearchString] WHERE RunId = @RunId AND ExecutedBy = CURRENT_USER

END

ELSE

SELECT 'No valid objects passed in the InputParameter to search the string' AS InvalidParameter

END

ELSE

SELECT 'SearchType parameter is mandatory ES - Exact Search, PS - Pattern Search,MPS - Multi Pattern Search - OR condition

,NTLS - Normal T-SQL Like Search' AS InvalidParameter

END

Sample execution code block:

EXEC usp_SearchString @SearchString = 'VAT'

,@SearchType = 'ES'

,@Match = 0 -- 0 = LIKE, 1 = NOT LIKE

,@IgnoreCase = 1 -- 1 = Case insensitive, 0 = Case Sensitive

,@SearchSQLMetadata= 0 -- 1 = Search, 0 = Don't Search

,@SchemaName = ''

,@ObjectlisttoSearch = 'dbo.Tmp'

,@SearchCollate = ''

Sample script execution screenshots

1. Search string in table from DB

Example 1: If we want to search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

2. Exact search in table from DB

In example 1, we did a pattern search. If we want to do a exact search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

3. Exact case sensitive search

Example 3: In example 2, we did a exact search. If we want to do a exact case sensitive search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

4. Exact case sensitive search (NTLS)

If we want to do a exact case sensitive search for a string "male" on Purchasing.PurchaseOrderLines table from WideWorldImporters database, we can try by setting parameters values as shown below :

5. Exact case sensitive search (Using R)

In example 4, it returned two rows (records that contains both "male" and "female") as the @SearchType was set NTLS (Normal T-SQL Like search). But we actually expect the result to be one row with record that contain only "male". If we again do a exact case sensitive search for a string "male" on Purchasing.PurchaseOrderLines table from WideWorldImporters database, by setting @SearchType = 'ES' shown below :

6. Multi string search delimited by pipe

If we want to do a multi string search delimited by pipe (search string "Ava","Amy") on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

Please note NoOfOccurance field will be populated only for @SearchType = "ES"& "PS" (for single string search without wildcard).

7. Multi string search using normal T-SQL LIKE search

In example 6, multi string search was done using R script. If we want to do the same multi string search using normal T-SQL LIKE search, we can try by setting parameters values as shown below :

Also in this example, let us see how we can search string with specific collation setting.

8. Fixed pattern search

If we want to do fixed pattern search say if we know the string to be searched is a two letter alphabet then we can try by setting parameters values as shown below :

9. Search phrase from multiple tables delimited by comma

Example 9: If we want to search for a phrase from multiple tables delimited by comma then we can try by setting parameters values as shown below :

10. Search for a date from multiple schema's delimited by comma

Example 10: If we want to search for a date from multiple schema's delimited by comma then we can try by setting parameters values as shown below :

11. Search string called "Password" in entire database

Example 11: If we want to search for a string called "Password" in entire database including SQL object definitions then we can try by setting parameters values as shown below :

12. Wild card search

Example 12: Below example shows how we can do wildcard search when the search is done using R script, refer parameters values as shown below :

To know more about R wild card search using "?","*","^","$", please see the link provided in reference section.

Tbl_SearchString stores the details about search made on a string. If we want to see the entire details (all the other fields from a table) of the record that matches the searched string then we can try like shown below:

--To get deails of particular RunId

SELECT DISTINCT RunId

,[ObjectName]

FROM [WideWorldImporters].[dbo].[Tbl_SearchString]

WHERE RunId = 12

SELECT A.*

,B.*

FROM [WideWorldImporters].[dbo].[Tbl_SearchString] A

JOIN (SELECT row_number()over(order by (SELECT 'A')) Rn,* FROM Warehouse.StockItemHoldings ) B --Change table name

ON A.SearchIndex = B.Rn

AND A.ObjectName = 'Warehouse.StockItemHoldings' --Change table name

AND RunId = 12 --provide run id

T-SQL : Data profiling in On-premise SQL Server / Azure SQL database

July 5, 2019, 2:04 pm

≫ Next: SQL Server Import/Export to Excel using Python script

≪ Previous: T-SQL : Search for string or phrase in SQL Server database

In this post, let us see how we can do data profiling on On-premise SQL Server / Azure SQL database tables using T-SQL script.

Data profiling provides below high level information about data:

Number of rows and size of the data in the object, date for the most recent update of the data and the object schema
Number of null records, distinct values for columns and data type of column
Minimum, maximum values for numeric columns and check for existence of special characters, leading/trailing spaces for columns of string data type

Two stored procedures are created to generate data profiling output. Inside the stored procedure, iteration on each table columns are performed using system catalog views to apply the MIN, MAX and other functions on each column to get the data profiling output

1) usp_DataProfiling

2) usp_DataProfiling_Metadata

Below are the various input parameters of stored procedure's and it’s usage details:

1) usp_DataProfiling

Parameter	Description
@Report	1 = ColumnDataProfiling 2 = ColumnUniqueValues
@SchemaName	Schema name for which data profiling needs to be done. Multiple values can be passed separated by comma
@ObjectlisttoSearch	List of tables separated by comma for which data profiling needs to be done
@ExcludeTables	Tables to be excluded from data profiling output. Input format: Schemaname.tablename. Multiple values can be passed separated by comma
@ExcludeColumns	Columns to be excluded from data profiling output. Multiple values can be passed separated by comma
@ExcludeDataType	Data types to be excluded from data profiling output. Multiple values can be passed separated by comma. These data types are excluded by default 'geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant' as they are not really relevant for data profiling
@RestrictCharlength	Resultset shows only one record describing the details of the field, if the field has maximum character length greater than 100 else resultset will show one record per unique value in that field. This default value can be overwritten using this parameter
@RestrictNoOfUniqueValues	Resultset shows only one record describing the details of the field, if that field has more than 50 unique values else resultset will show one record per unique value in that field. This default value can be overwritten using this parameter

2) usp_DataProfiling_Metadata

Parameter	Description
@Report	1 = TableStats 2 = TableColumnMetadata
@SchemaName	Schema name for which data profiling needs to be done. Multiple values can be passed separated by comma
@ObjectlisttoSearch	List of tables separated by comma for which data profiling needs to be done

Data Profiling using T-SQL script:

1) usp_DataProfiling

CREATEORALTERPROCEDUREusp_DataProfiling

@Report TINYINT , --1 = 'ColumnDataProfiling', 2 = 'ColumnUniqueValues'

@SchemaName NVARCHAR(MAX) = N'',

@ObjectlisttoSearch NVARCHAR(MAX),

@ExcludeTables NVARCHAR(MAX) = N'',

@ExcludeColumns NVARCHAR(MAX) = N'',

@ExcludeDataType NVARCHAR(100) = N'',

@RestrictCharlength INT,

@RestrictNoOfUniqueValues INT

AS

BEGIN

SETNOCOUNT ON;

SETANSI_WARNINGS OFF;

SETANSI_NULLS ON;

SELECT@RestrictCharlength = IIF(@RestrictCharlength ISNULLOR@RestrictCharlength = '',100,@RestrictCharlength)

SELECT@RestrictNoOfUniqueValues = IIF(@RestrictNoOfUniqueValues ISNULLOR@RestrictNoOfUniqueValues = '',50,@RestrictNoOfUniqueValues)

DECLARE@TableColList TABLE(Id INTIDENTITY(1,1),Tbl NVARCHAR(128),colname NVARCHAR(200),ColType NVARCHAR(150))

IF ISNULL(@SchemaName,'') <> '' ORISNULL(@ObjectlisttoSearch,'') <> ''

BEGIN

INSERT@TableColList

SELECT DISTINCTCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

,C.name

,CASEWHENTY.is_user_defined = 1 THEN(SELECTnameFROMsys.types

WHEREsystem_type_id = user_type_id

AND system_type_id = TY.system_type_id)

ELSETY.name

END

FROMSys.tables T

JOINsys.columns C

ONT.object_id = C.object_id

JOINsys.types TY

ONC.[user_type_id] = TY.[user_type_id]

-- Ignore the datatypes that are not required

WHERETY.nameNOTIN('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')

AND(Schema_name(T.schema_id) IN(SELECTvalue FROMSTRING_SPLIT(@SchemaName, ','))

ORCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN(SELECTvalue FROMSTRING_SPLIT(@ObjectlisttoSearch, ',')))

AND(TY.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeDataType, ','))

ANDTY.name= TY.name)

AND(C.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeColumns, ','))

ANDC.name= C.name)

AND(CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) NOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeTables, ','))

ANDCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) = CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name))

ENDELSE

BEGIN

INSERT@TableColList

SELECT DISTINCTCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

,C.name

,CASEWHENTY.is_user_defined = 1 THEN(SELECTnameFROMsys.types

WHEREsystem_type_id = user_type_id

AND system_type_id = TY.system_type_id)

ELSETY.name

END

FROMSys.tables T

JOINsys.columns C

ONT.object_id = C.object_id

JOINsys.types TY

ONC.[user_type_id] = TY.[user_type_id]

-- Ignore the datatypes that are not required

WHERETY.nameNOTIN('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')

AND(TY.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeDataType, ','))

ANDTY.name= TY.name)

AND(C.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeColumns, ','))

ANDC.name= C.name)

AND(CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) NOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeTables, ','))

ANDCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) = CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name))

END

DROPTABLEIF EXISTS #Final

CREATETABLE#Final (Id BIGINTIDENTITY(1,1),TableName NVARCHAR(128),ColumnName NVARCHAR(200),ColumnType NVARCHAR(150),ColumnUniqueValues NVARCHAR(MAX),UniqueValueOccurance BIGINT,MissingDataRowCount BIGINT,MinValue NVARCHAR(MAX),MaxValue NVARCHAR(MAX),SpecialCharacters BIGINT,LeadingTrailingSpaces BIGINT,MinFieldValueLen BIGINT,MaxFieldValueLen BIGINT,Comment NVARCHAR(MAX))

DROPTABLEIF EXISTS #temp

CREATETABLE#temp(Id BIGINTIDENTITY(1,1),TableName NVARCHAR(128),ColumnName NVARCHAR(200),Cnt BIGINT,MaxLen BIGINT,MinLen BIGINT,MissingDataCount BIGINT,MinValue NVARCHAR(MAX),MaxValue NVARCHAR(MAX),SpecialCharacters BIGINT,LeadingTrailingSpaces BIGINT)

DECLARE@I INT= 1

,@SQL NVARCHAR(MAX) = N''

,@tblname NVARCHAR(128)

,@Colname NVARCHAR(200)

,@ColType NVARCHAR(150)

,@Cnt BIGINT

,@MaxLen BIGINT

,@MinLen BIGINT

,@MissingData BIGINT

,@MaxVal NVARCHAR(MAX) = N''

,@MinVal NVARCHAR(MAX) = N''

,@MinMAxSQL NVARCHAR(MAX) = N''

,@SpecialCharacters BIGINT

,@LeadingTrailingSpaces BIGINT

WHILE @I <= (SELECTMAX(Id) FROM@TableColList)

BEGIN

SELECT@Colname = QUOTENAME(colname),@tblname = Tbl,@ColType = ColType FROM@TableColList

WHEREId = @I

SELECT@MinMAxSQL = CASEWHEN@ColType IN('date','datetime','datetime2','datetimeoffset','time','timestamp')

THENCONCAT(' FORMAT (MIN(',@Colname,'), ''yyyy-MM-dd,hh:mm:ss'') MinValue,FORMAT (MAX(',@Colname,'), ''yyyy-MM-dd,hh:mm:ss'') MAXValue')

WHEN@ColType = 'bit'

THEN'0 AS MinValue,1 AS MaxValue'

ELSECONCAT('CASE WHEN EXISTS (SELECT 1 FROM ',@tblname,' WHERE ISNUMERIC(',@Colname,') = 0)','THEN NULL ELSE MIN(',@Colname,') END MinValue

,CASE WHEN EXISTS (SELECT 1 FROM ',@tblname,' WHERE ISNUMERIC(',@Colname,') = 0)','THEN NULL ELSE MAX(',@Colname,') END MAXValue')

END

EXEC(';WITH CTE AS (

SELECT COUNT_BIG(DISTINCT '+@Colname+') Cnt

,MAX(LEN('+@Colname+')) MaxLen

,MIN(LEN('+@Colname+')) MinLen

,SUM(CASE WHEN '+@Colname+' IS NULL OR CAST('+@Colname+' AS VARCHAR(MAX)) = '''' THEN 1 ELSE 0 END) MissingData

,'+@MinMAxSQL+'

,CASE WHEN '''+@ColType+''' IN (''nvarchar'',''varchar'',''nchar'',''char'')

THEN SUM(CASE WHEN '+@Colname+' LIKE ''%[^a-zA-Z0-9 ]%'' THEN 1 ELSE 0 END)

ELSE NULL END SpecialCharacters

,CASE WHEN '''+@ColType+''' IN (''nvarchar'',''varchar'',''nchar'',''char'')

THEN SUM(CASE WHEN ISNULL(DATALENGTH('+@Colname+'),'''') = ISNULL(DATALENGTH(RTRIM(LTRIM('+@Colname+'))),'''') THEN 0 ELSE 1 END)

ELSE NULL END LeadingTrailingSpaces

FROM '+@tblname+' )

INSERT #temp(TableName,ColumnName,Cnt,MaxLen,MinLen,MissingDataCount,MinValue,MaxValue,SpecialCharacters,LeadingTrailingSpaces)

SELECT '''+@tblname+''','''+@Colname+''',Cnt,ISNULL(MaxLen,0) MaxLen,ISNULL(MinLen,0) MinLen,ISNULL(MissingData,0) MissingData,MinValue,MAXValue

,ISNULL(SpecialCharacters,0) SpecialCharacters,ISNULL(LeadingTrailingSpaces,0) LeadingTrailingSpaces FROM CTE')

SELECT@Cnt = Cnt,@MaxLen = MaxLen,@MinLen = MinLen,@MissingData = MissingDataCount,@MinVal=MinValue,@MaxVal=MAXValue

,@SpecialCharacters = SpecialCharacters ,@LeadingTrailingSpaces = LeadingTrailingSpaces

FROM#temp

WHEREId = @I ANDTableName = @tblname ANDColumnName = @Colname

IF ISNULL(@MaxLen,'') < @RestrictCharlength ANDISNULL(@Cnt,'') < @RestrictNoOfUniqueValues

BEGIN

SET@SQL = CONCAT('SELECT ''',@tblname,''',''',@Colname,''',''',@ColType,''',',@Colname,',COUNT_BIG(',@Colname,'),',@MissingData,',''',@MinVal,''',''',@MaxVal,''',',@SpecialCharacters,',',@LeadingTrailingSpaces,',',@MinLen,',',@MaxLen,',','''','This field has Unique values = ',@Cnt,'''',' FROM ',@tblname,' GROUP BY ',@Colname)

INSERT

#Final (TableName,ColumnName,ColumnType,ColumnUniqueValues,UniqueValueOccurance,MissingDataRowCount,MinValue,MaxValue,SpecialCharacters,LeadingTrailingSpaces,MinFieldValueLen,MaxFieldValueLen,Comment)

EXEC(@SQL)

END

ELSE

BEGIN

INSERT

#Final (TableName,ColumnName,ColumnType,MissingDataRowCount,MinValue,MaxValue,SpecialCharacters,LeadingTrailingSpaces,MinFieldValueLen,MaxFieldValueLen,Comment)

SELECT

@tblname,@Colname,@ColType,@MissingData,@MinVal,@MaxVal,@SpecialCharacters,@LeadingTrailingSpaces,@MinLen,@MaxLen,CONCAT(

'This field has Unique values = ',@Cnt)

END

SET@I = @I + 1

END

IF @Report = 1

BEGIN

SELECTDISTINCT TableName,ColumnName,ColumnType,MissingDataRowCount,MinValue,MaxValue,SpecialCharacters

,LeadingTrailingSpaces,MinFieldValueLen,MaxFieldValueLen,Comment

FROM#Final

ORDERBYTableName,ColumnName

END

IF @Report = 2

BEGIN

SELECTTableName,ColumnName,ColumnUniqueValues,UniqueValueOccurance,Comment

FROM#Final

ORDERBYTableName,ColumnName

END

2) usp_DataProfiling_Metadata

CREATEORALTERPROCEDUREusp_DataProfiling_Metadata

@Report TINYINT , --1 = 'TableStats', 2 = 'TableColumnMetadata'

@SchemaName NVARCHAR(MAX) = N'',

@ObjectlisttoSearch NVARCHAR(MAX) = N''

AS

BEGIN

SETNOCOUNT ON;

DROPTABLEIF EXISTS #TblList

CREATETABLE#TblList(Id INTIDENTITY(1,1),TableName NVARCHAR(200) )

DROPTABLEIF EXISTS #Tblstats

CREATETABLE#Tblstats (TableName NVARCHAR(200),NoOfRows NVARCHAR(100),ReservedSpace NVARCHAR(100)

,DataSpace NVARCHAR(100),IndexSize NVARCHAR(100),UnusedSpace NVARCHAR(100)

,LastUserUpdate DATETIME)

IF ISNULL(@SchemaName,'') <> '' ORISNULL(@ObjectlisttoSearch,'') <> ''

BEGIN

INSERT#TblList (TableName)

SELECTCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) TableName

FROMSys.tables

WHERE(Schema_name(schema_id) IN(SELECTvalue FROMSTRING_SPLIT(@SchemaName, ','))

ORCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) IN(SELECTvalue FROMSTRING_SPLIT(@ObjectlisttoSearch, ',')))

ENDELSE

BEGIN

INSERT#TblList (TableName)

SELECTCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) TableName

FROMSys.tables

END

DECLARE@Tblstats TABLE(TableName NVARCHAR(200),NoOfRows NVARCHAR(100),ReservedSpace NVARCHAR(100)

,DataSpace NVARCHAR(100),IndexSize NVARCHAR(100),UnusedSpace NVARCHAR(100)

)

DECLARE@I INT= 1

,@tblname NVARCHAR(128) = N''

,@last_user_update DATETIME

WHILE @I <= (SELECTCOUNT(1) FROM#TblList)

BEGIN

SELECT@tblname=TableName FROM#TblList WHEREId = @I

INSERT@Tblstats

EXECsp_spaceused @tblname;

SELECTTOP1 @last_user_update=last_user_update

FROMsys.dm_db_index_usage_stats

WHEREobject_id = OBJECT_ID(@tblname)

ORDERBY last_user_update DESC

INSERT#Tblstats(TableName,NoOfRows,ReservedSpace,DataSpace,IndexSize,UnusedSpace,LastUserUpdate)

SELECT@tblname,NoOfRows,ReservedSpace,DataSpace,IndexSize,UnusedSpace,@last_user_update

FROM@Tblstats

DELETEFROM@Tblstats

SET@I = @I + 1

END

IF @Report = 1

BEGIN

;WITHSystbl

AS

(

SELECTDISTINCTCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) TableName

,modify_date TableSchema_LastModifyDate

,CASEWHENis_replicated = 1 THEN'Yes'ELSE'No'ENDASIsReplicated

,CASEWHENis_filetable = 1 THEN'Yes'ELSE'No'ENDASIsFileTable

,CASEWHENis_memory_optimized = 1 THEN'Yes'ELSE'No'ENDASIsMemoryOptimized

,temporal_type_desc TemporalTypeDesc

,CASEWHENis_remote_data_archive_enabled = 1 THEN'Yes'ELSE'No'ENDASIsStretchEnabled

,CASEWHENis_external = 1 THEN'Yes'ELSE'No'ENDASIsExternal

,CASEWHENis_node = 1 ORis_edge = 1 THEN'Yes'ELSE'No'ENDIsGraphTable

FROMsys.tables ST

JOIN#TblList T

ONCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) COLLATEDATABASE_DEFAULT = T.TableName COLLATEDATABASE_DEFAULT

)

SELECTB.*,A.TableSchema_LastModifyDate

,A.IsMemoryOptimized

,A.IsExternal

,A.IsStretchEnabled

,A.IsFileTable

,A.IsGraphTable

,A.IsReplicated

,A.TemporalTypeDesc

FROMSystbl A

JOIN#Tblstats B

ONA.TableName COLLATEDATABASE_DEFAULT = B.TableName COLLATEDATABASE_DEFAULT

END

IF @Report = 2

BEGIN

SELECT DISTINCTCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

,C.nameColumnName

,CASEWHENTY.is_user_defined = 1 THEN(SELECTnameFROMsys.types

WHEREsystem_type_id = user_type_id

AND system_type_id = TY.system_type_id)

ELSETY.name

ENDASDataType

,C.max_length

,C.precision

,C.scale

,C.collation_name

,CASEWHENC.is_nullable = 1 THEN'Yes'ELSE'No'ENDASIsNullable

,CASEWHENC.is_identity = 1 THEN'Yes'ELSE'No'ENDASIsIdentity

,CASEWHENC.is_masked = 1 THEN'Yes'ELSE'No'ENDASIsMasked

,CASEWHENC.is_hidden = 1 THEN'Yes'ELSE'No'ENDASIsHidden

,CASEWHENC.is_computed = 1 THEN'Yes'ELSE'No'ENDASIsComputed

,CASEWHENC.is_filestream = 1 THEN'Yes'ELSE'No'ENDASIsFileStream

,CASEWHENC.is_sparse = 1 THEN'Yes'ELSE'No'ENDASIsSparse

,C.encryption_type_desc EncryptionTypeDesc

FROMSys.tables T

JOINsys.columns C

ONT.object_id = C.object_id

JOINsys.types TY

ONC.[user_type_id] = TY.[user_type_id]

WHERE(Schema_name(T.schema_id) IN(SELECTvalue FROMSTRING_SPLIT(@SchemaName, ','))

ORCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN(SELECTvalue FROMSTRING_SPLIT(@ObjectlisttoSearch, ',')))

END

Please see my technet wiki article for detailed explanation on this topic with examples and screenshots and you can also download the scripts and SSRS report from technet gallery site.

SQL Server Import/Export to Excel using Python script

July 6, 2019, 12:06 am

≫ Next: How to use R script in Power BI

≪ Previous: T-SQL : Data profiling in On-premise SQL Server / Azure SQL database

Introduction

In previous article, we saw how to import/export excel to/from SQL Server by executing R script within T-SQL.

In this post, let us see another similar approach to import excel into SQL Server and export SQL server data to excel by executing Python script within T-SQL.

There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.

BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly
BCP - Workaround has to be done to include the header while exporting
SSIS - Though it supports exporting to excel, with dynamic source & destination, handling mapping between source to target increases the complexity of the package
SQLCMD - Cannot export output in Excel file format

R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database and Java language extension support in SQL Server 2019 , this new approach can be used extensively as it easy, fast and flexible.

Pre-requisites

We have used SQL Server 2019 evaluation edition on Windows 10 64 bit and WideWorldImporters SQL Server sample database for this example.

Below scripts will work starting from SQL Server 2017 and above (as execution of Python language using T-SQL was introduced in SQL Server 2017).

Only pre-requisite step is to install Python services and then from SSMS enable the external scripting feature. Restart the database engine and then verify the installation as mentioned in MSDN .

Provide folder permission to access the excel files during import / export process, right-click on folder -> Properties -> Security -> (Full Control) to "ALL_APPLICATION_PACKAGES".

pandas.DataFrame.to_excel & pandas.read_excel are used to export and import excel which are installed by default

We can check that by running below statement in SSMS:

EXECUTEsp_execute_external_script

@language =N'Python',

@script=N'import pip

for i in pip.get_installed_distributions():

print(i)';

GO

Exporting SQL Server data (list of tables, views) into Excel file

For this approach, we have created a Stored procedure named "usp_ExportExcel" in "WideWorldImporters" database. Executing the Stored procedure based on input parameters exports SQL Server data to excel files

Stored procedure - Has below three input parameters and writes the output to excel file

Parameter	Description
@ExportPath	Path for exporting excel files
@SchemaName	List of objects under this schema to be exported. Can have multiple values separated by comma
@ObjectlisttoExport	List of tables, views to be exported. Can have multiple values separated by comma

CREATEORALTERPROC usp_ExportExcel (@ExportPath NVARCHAR(MAX),

@SchemaName NVARCHAR(MAX),

@ObjectlisttoExport NVARCHAR(MAX)

)

AS

BEGIN

SETNOCOUNT ON;

BEGINTRY

IF ISNULL(@ExportPath,'') <> ''

BEGIN

SELECT@ExportPath = CASEWHENRIGHT(@ExportPath,1) = '\' THEN @ExportPath ELSE CONCAT(@ExportPath,'\') END

DECLARE @ValidPath TABLE (ValidPathCheck BIT)

INSERT @ValidPath

EXEC sp_execute_external_script

@language =N'Python',

@script=N'

import pandas aspd

d = os.path.isdir(ExportFilePath)

OutputDataSet = pd.DataFrame(d,columns=["Filename"])'

,@params = N'@ExportFilePath NVARCHAR(MAX)'

,@ExportFilePath = @ExportPath

IF (SELECT ValidPathCheck FROM @ValidPath) = 1

BEGIN

IF ISNULL(@SchemaName,'') <> '' OR ISNULL(@ObjectlisttoExport,'') <> ''

BEGIN

DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList

CREATE TABLE #ExportTablesList(TableCount INT IDENTITY(1,1),Cols NVARCHAR(MAX),TableName NVARCHAR(200))

--Get the list of objects to be exported

INSERT #ExportTablesList (Cols,TableName)

SELECT CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp','decimal','bit','int','bigint')

THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS',QUOTENAME(C.NAME))

ELSE C.name END Cols -- To cover poor data type conversions b/n Python & SQL Server

,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

FROM Sys.tables T

JOIN sys.columns C

ON T.object_id = C.object_id

JOIN sys.types TY

ON C.[user_type_id] = TY.[user_type_id]

WHERE Schema_name(T.schema_id) IN (SELECT value FROM STRING_SPLIT(@SchemaName, ','))

-- Ignore the datatypes that are not required to be exported

AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')

INSERT #ExportTablesList (Cols,TableName)

SELECT CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp','decimal','bit','int','bigint')

THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS',QUOTENAME(C.NAME))

ELSE C.name END Cols -- To cover poor data type conversions b/n Python & SQL Server

,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName

FROM Sys.tables T

JOIN sys.columns C

ON T.object_id = C.object_id

JOIN sys.types TY

ON C.[user_type_id] = TY.[user_type_id]

WHERE CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, ','))

-- Ignore the datatypes that are not required to be exported

AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')

--Dedup of object list

;WITH dedup

AS

(

SELECT *,ROW_NUMBER()OVER(PARTITION BY TableName,Cols ORDER BY Cols) Rn FROM #ExportTablesList

)

DELETE FROM dedup

WHERE Rn > 1

--Forming columns list as comma separated

SELECT TableName,IDENTITY(INT,1,1) AS TableCount

, STUFF(

(

SELECT ', ' + C.Cols

From #ExportTablesList As C

WHERE C.TableName = T.TableName

FOR XML PATH('')

), 1, 2, '') AS Cols

INTO #FinalExportList

From #ExportTablesList As T

GROUP BY TableName

DECLARE @I INT = 1

,@TableName NVARCHAR(200)

,@SQL NVARCHAR(MAX) = N''

,@PythonScript NVARCHAR(MAX) = N''

,@ExportFilePath NVARCHAR(MAX) = N''

--Loop through the object list to export as excel

WHILE @I <= (SELECT COUNT(TableName) FROM #FinalExportList)

BEGIN

-- Just for testing purpose top 10 records are selected

SELECT @SQL = CONCAT('SELECTTOP10 ',Cols,'FROM',TableName,';')

,@TableName = TableName

FROM #FinalExportList WHERE TableCount = @I

SET @PythonScript = N'

FullFilePath = ExcelFilePath+TableName+".xlsx"

InputDataSet.to_excel(FullFilePath,sheet_name=TableName.split(".")[-1],index=False)'

EXEC sp_execute_external_script

@language = N'Python'

,@script = @PythonScript

,@input_data_1 = @SQL

,@params = N'@ExcelFilePath NVARCHAR(MAX), @TableName NVARCHAR(200)'

,@ExcelFilePath = @ExportPath -- file path where Excel files are placed

,@TableName = @TableName

SET @I = @I + 1

END

END ELSE PRINT 'Schemanameofobjects orlist ofobjects (separated bycomma) tobe exported need tobe mentioned'

END ELSE PRINT 'Invalid folder path'

END ELSE PRINT 'Export folder path need tobe mentioned'

END TRY

BEGIN CATCH

PRINT 'Issue while executing this SP, please checkwhether there ispermission toexecutethe script / toaccess the folder andinput params are valid'

ENDCATCH

END

Importing Excel file into SQL Server

For this approach, we have created a Stored procedure named "usp_ImportExcel" in "WideWorldImporters"database. Executing the Stored procedure based on input parameters imports excel files to SQL Server table

Stored procedure - Has below six input parameters

Parameter	Description
@ImportPath	Path where excel files are placed for importing into SQL Server
@DBConnectionString	Target SQL Server database connection string where files are imported. Can be Database=DB name; Trusted_Connection=True or Database=DB name;Uid= user name;Pwd=Password
@ImportAll	If set to 1 then all files in the mentioned path are imported. If set to 0 then only mentioned files are imported
@CombineTarget	Flag to decide single target table for each source file (files with same structure) or separate target table for each source file
@ExcelFileName	If @ImportAll = 0 then excel file name needs to be passed to this parameter
@ExcelSheetName	If @ImportAll = 0 then corresponding sheet name of the excel file needs to be passed to this parameter

CREATEORALTERPROC usp_ImportExcel (@ImportPath NVARCHAR(MAX),

@DBConnectionString NVARCHAR(MAX),

@ImportAll BIT,

@CombineTarget BIT,

@ExcelFileName NVARCHAR(200),

@ExcelSheetName NVARCHAR(50)

)

AS

BEGIN

SETNOCOUNT ON;

BEGINTRY

IF ISNULL(@ImportPath,'') <> ''ANDISNULL(@DBConnectionString,'') <> ''

BEGIN

SELECT@ImportPath = CASEWHENRIGHT(@ImportPath,1) = '\' THEN @ImportPath ELSE CONCAT(@ImportPath,'\') END

DECLARE @Serv NVARCHAR(200) = CONCAT(CHAR(39),CHAR(39),@@SERVERNAME,CHAR(39),CHAR(39))

DECLARE @ValidPath TABLE (ValidPathCheck BIT)

INSERT @ValidPath

EXEC sp_execute_external_script

@language =N'Python',

@script=N'

import pandas aspd

d = os.path.isdir(ImportFilePath)

OutputDataSet = pd.DataFrame(d,columns=["Filename"])'

,@params = N'@ImportFilePath NVARCHAR(MAX)'

,@ImportFilePath = @ImportPath

IF (SELECT ValidPathCheck FROM @ValidPath) = 1

BEGIN

IF (@ImportAll = 0 AND (ISNULL(@ExcelFileName,'') <> '' AND ISNULL(@ExcelSheetName,'') <> ''))

OR (@ImportAll = 1 AND (ISNULL(@ExcelFileName,'') = '' AND ISNULL(@ExcelSheetName,'') = ''))

BEGIN

DECLARE @PythonScript NVARCHAR(MAX) =CONCAT('

import pandas aspd

import os

import glob

fromrevoscalepy import RxSqlServerData, rx_data_step

sqlConnString = "Driver=SQL Server;Server=Serv; ',@DBConnectionString,'"

Filefolderepath = ImportFilePath+"*.xlsx"

if ImportAll ==1 andCombineTarget==0:

forFullFilePath inglob.glob(Filefolderepath):

Filename = os.path.basename(FullFilePath).replace(".xlsx","")

xl = pd.ExcelFile(FullFilePath)

forsheetname inxl.sheet_names:

Output= pd.read_excel(FullFilePath, sheetname=sheetname, na_filter=False).astype(str)

if notOutput.empty:

sqlDS = RxSqlServerData(connection_string = sqlConnString,table= "".join(fl forfl inFilename if fl.isalnum())+"_"+"".join(sh forsh insheetname if sh.isalnum()))

rx_data_step(input_data = Output, output_file = sqlDS,overwrite = True)

if ImportAll ==1 andCombineTarget==1:

df2=pd.DataFrame()

forFullFilePath inglob.glob(Filefolderepath):

Filename = os.path.basename(FullFilePath).replace(".xlsx","")

xl = pd.ExcelFile(FullFilePath)

forsheetname inxl.sheet_names:

Output= pd.read_excel(FullFilePath, sheetname=sheetname).columns.astype(str)

Output= ",".join(list(Output))

df1 = pd.DataFrame([[Filename,sheetname,FullFilePath,Output]],columns=["Filename","sheetname","FullFilePath","Headers"])

df2=df2.append(df1,ignore_index=True)

sqlDS = RxSqlServerData(connection_string = sqlConnString,table= "Tbl_PyImpExp1")

rx_data_step(input_data = df2, output_file = sqlDS,overwrite = True)

if ImportAll ==0:

Filename =ImportFilePath+ExcelFileName+".xlsx"

exists = os.path.isfile(Filename)

if exists andExcelSheetName inpd.ExcelFile(Filename).sheet_names:

Output= pd.read_excel(Filename, sheetname=ExcelSheetName, na_filter=False).astype(str)

if notOutput.empty:

sqlDS = RxSqlServerData(connection_string = sqlConnString,table= "".join(fl forfl inExcelFileName if fl.isalnum())+"_"+"".join(sh forsh inExcelSheetName if sh.isalnum()))

rx_data_step(input_data = Output, output_file = sqlDS,overwrite = True)

else:

print("Invalid Excel file or sheet name")')

EXEC sp_execute_external_script

@language = N'Python'

,@script = @PythonScript

,@params = N'@ImportFilePath NVARCHAR(MAX),@ImportAll BIT,@CombineTarget BIT,@ExcelFileName NVARCHAR(200),@ExcelSheetName NVARCHAR(50),@Serv NVARCHAR(200)'

,@ImportFilePath = @ImportPath

,@ImportAll = @ImportAll

,@CombineTarget = @CombineTarget

,@ExcelFileName = @ExcelFileName

,@ExcelSheetName = @ExcelSheetName

,@Serv = @Serv

IF @ImportAll =1 AND @CombineTarget =1

BEGIN

IF OBJECT_ID('Tbl_PyImpExp1') IS NOT NULL

BEGIN

DROP TABLE IF EXISTS Tbl_PyImpExp2

;WITH FileList

As(

SELECT [Filename]

,[sheetname]

,[Headers]

,[FullFilePath]

,ROW_NUMBER()OVER(ORDER BY (SELECT 1)) Rn

,ROW_NUMBER()OVER(PARTITION BY [Headers] ORDER BY [Headers]) Grp

,DENSE_RANK()OVER(ORDER BY [Headers]) Grp1

FROM [dbo].[Tbl_PyImpExp1]

)

SELECT *,FIRST_VALUE([Filename]) OVER (PARTITION BY Grp1 ORDER BY Grp ASC) AS TableName

INTO Tbl_PyImpExp2

FROM FileList

END

IF EXISTS (SELECT 1 FROM Tbl_PyImpExp2)

BEGIN

DECLARE @I INT = 1

,@SQL NVARCHAR(MAX) =N''

SET @PythonScript = CONCAT('

import pandas aspd

fromrevoscalepy import RxSqlServerData, rx_data_step

sqlConnString = "Driver=SQL Server;Server=Serv; ',@DBConnectionString,'"

if ImportAll ==1 andCombineTarget==1:

FinalImport=pd.DataFrame()

forindex, row inInputDataSet.iterrows():

Tbl = "".join(T forT inrow["TableName"] if T.isalnum())

Import = pd.read_excel(row["FullFilePath"], sheetname=row["sheetname"], na_filter=False).astype(str)

Import["ImportKey"] = row["TableName"]+"_"+row["sheetname"]

FinalImport=FinalImport.append(Import,ignore_index=True)

if notFinalImport.empty:

sqlDS = RxSqlServerData(connection_string = sqlConnString,table= Tbl)

rx_data_step(input_data = FinalImport, output_file = sqlDS,overwrite = True)')

WHILE @I <= (SELECT MAX(Grp1) FROM Tbl_PyImpExp2)

BEGIN

SET @SQL = CONCAT('SELECTFullFilePath,sheetname,TableName FROMTbl_PyImpExp2 WHEREGrp1 = ',@I)

EXEC sp_execute_external_script

@language = N'Python'

,@script = @PythonScript

,@input_data_1 = @SQL

,@params = N'@ImportAll BIT,@CombineTarget BIT,@Serv NVARCHAR(200)'

,@Serv = @Serv

,@ImportAll = @ImportAll

,@CombineTarget = @CombineTarget

SET @I = @I + 1

END

DROP TABLE IF EXISTS Tbl_PyImpExp1,Tbl_PyImpExp2

END

END ELSE PRINT 'Invalid parameters: If ImportAll = 0 thenpass Excel file & Sheet Nameasinput. If ImportAll = 1 thenpass Excel file & Sheet Nameblank'

END ELSE PRINT 'Invalid folder path'

END ELSE PRINT 'Import folder path ordatabaseconnectionstring need tobe mentioned'

END TRY

BEGIN CATCH

PRINT 'Issue while executing this SP, please checkwhether there ispermission toexecutethe script / toaccess the folder andinput params are valid'

ENDCATCH

END

Please see my technet wiki article for detailed explanation on this topic with examples and screenshots and you can also download the scripts and SSRS report from technet gallery site.

We need to be careful with indentations in python script as formatting them without understanding the code can result in error.

How to use R script in Power BI

March 9, 2020, 2:40 pm

≪ Previous: SQL Server Import/Export to Excel using Python script

In this post, I have embedded the video link to my you tube channel and in the video I have showed how to use R script in Power BI. I have also posted below the scripts shown in the video.

In the video, I have explained:

1)How to R script in Power BI
2)Example dataset - COVID-19 (2019-nCoV) github data repository by Johns Hopkins University. How to invoke this data dynamically using Rcurl.
3)How to configure to specific R version in R Studio & Power BI

4)Sample visualization in Power BI using R dataset

How to use R script in Power BI
--To enable external scripts execution
EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE

--To print R version
EXECUTE sp_execute_external_script @language = N'R'
, @script = N'print(version)';
GO

--To list all R packages installed

EXEC sp_execute_external_script @language = N'R'
, @script = N'
OutputDataSet <- data.frame(installed.packages()[,c("Package", "Version", "Depends", "License", "LibPath")]);'
WITH result sets((
Package NVARCHAR(255)
, Version NVARCHAR(100)
, Depends NVARCHAR(4000)
, License NVARCHAR(1000)
, LibPath NVARCHAR(2000)

));

R Script used in power BI:
library(RCurl)

dt <- format(as.Date(Sys.time())-1, "%m-%d-%Y")
uri1 <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"

uri <- paste0(uri1,dt,".csv")

rawdata <- getURL(uri)

convidds <- read.csv(text = rawdata)

See Also:

My other posts on Power BI

SQL Server Import/Export excel using R

↧