Quantcast
Channel: All about SQLServer
Viewing all 175 articles
Browse latest View live

How to create linked server between On-premise and Azure SQL data warehouse

$
0
0
In my previous post, l wrote about how to create linked server between On-premise and Azure SQL database.

In this post, let us see how to create linked server between On-premise and Azure SQL data warehouse and after establishing linked server, how we can query Azure SQL data warehouse tables from local On-premise SQL Server.

I have created Azure SQL data warehouse in my previous postand for this example I have created table "dwtable1" within azure sql dw.

Below are the steps to publish linked server connection between On-premise and azure sql dw.




From SSMS, connect to On-premise SQL Server. Goto Object Explorer -> Server Objects -> Linked servers

Linked Server - Linked server name. I have provided the linked server name as AZSQLDW.
Provider - SQL native client 11.0
Data source - Azure SQL Server detail
Catalog - Azure SQL data warehouse. I have created azure sql dw with name trnazsqldw




Provide Remote login & password (Azure SQL server authentication):



Now we can query the Azure SQL database from local On-premise SQL Server using four part naming convention as shown below:

SELECT * FROM Linkedservername.databasename.schemaname.tablename





--Select data in azure sql dw from on-premise sql server using four part naming convention
SELECT * FROM AZSQLDW.[trnazsqldw].dbo.dwtable1

--Insert data into azure sql dw from on-premise sql server using EXEC command
EXEC ( 'INSERT [trnazsqldw].dbo.dwtable1 (Id)
VALUES(1)' ) AT AZSQLDW;

--Update data in azure sql dw from on-premise sql server using four part naming convention
UPDATE AZSQLDW.[trnazsqldw].dbo.dwtable1 SET Id = 100
WHERE Id = 1

--Delete data in azure sql dw from on-premise sql server using four part naming convention
DELETE FROM AZSQLDW.[trnazsqldw].dbo.dwtable1
WHERE Id = 1


See Also:


SQL Server Export/Import excel using R script - Part 1

$
0
0

In this post, let us see how we can easily import or export to excel with just few lines of r script executed from SSMS using sp_execute_external_script.

I have done below example using SQL Server 2016 on windows 10 (64 bit) and R (3.2.2).
Refer my previous post on how to install R services

Also I have used "WideWorldImporters" sample database for this example.



Now let us jump on to the prerequisite steps for this approach. Though prerequisite steps are bit heavy, it is one time and thereafter enable us to export / import to excel easily with just few lines of code.

Prerequisite Step 1: Check the version of R by running sessionInfo() from R.exe

In my case, I have installed SQL Server 2016 so R.exe will be found in below path:
C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\bin



Prerequisite Step 2: From above step I got to know that it is 64 bit, So I have downloaded Java JDK for 64 bit (.exe file) from here & installed it.
C:\Program Files\Java\jdk-11.0.

Prerequisite Step 3: Now install rJava package from R.exe

install.packages("rJava")




Prerequisite Step 4: Now let us set the JAVA_HOME environment to Java JDK installed in Step 2 and load the rJava package to check if it working from R.exe.

Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava)



Prerequisite Step 5: Now it's time to install & load the xlsx package from R.exe.
·         xlsx package is one of the powerful R packages to read, write and format Excel files.
·         It is a java-based solution and it is available for Windows, Mac and Linux.
·         It works for both Excel 2007 and Excel 97/2000/XP/2003 file formats (xls and xlsx file formats).

install.packages("xlsx")
library("xlsx")



Prerequisite Step 6: Copy & paste the packages downloaded from above steps into default R library  to SQL Server R_services library.

I have copied xlsx,xlsxjars & rJava folders from C:\Users\Sathy\Documents\R\win-library\3.2
to C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\library

Now as part of last prerequisite step, from SSMS let us check the installed R packages. To execute R script using sp_execute_external script, refer to my previous post link provided in the beginning of this post.

sp_execute_external_script @language = N'R', @script = N'OutputDataSet <- data.frame(installed.packages())'





Now the script to export SQL Server table data into Excel:
Below script is pretty much simple and I have selected only first six key fields
from [WideWorldImporters].[Sales].[Orders] table and passed this as inputdataset (dataframe) to write.xlsx2 function.

USE [WideWorldImporters]

DECLARE @SQL NVARCHAR(MAX) = N'
SELECT [OrderID]
,[CustomerID]
,[SalespersonPersonID]
,[PickedByPersonID]
,[ContactPersonID]
,[BackorderOrderID]
FROM [WideWorldImporters].[Sales].[Orders];'

DECLARE @RScript NVARCHAR(MAX) =N'
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava);
library("xlsx");
write.xlsx2(InputDataSet, file = "H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName="Order",
col.names=TRUE, append=FALSE) '

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@input_data_1 = @SQL


73,595 records got exported in ~10 seconds. Note I have used "write.xlsx2" in above R script as it is faster than "write.xlsx".

More information on this topic, to be continued in part 2 ...

See Also: 

SQL Server Export/Import excel using R script - Part 2

$
0
0

In continuation to my previous post on SQL Server Export/Import excel using R script - Part 1, let us see few other examples in this post.

In previous post, I had selected only first six key fields from [WideWorldImporters].[Sales].[Orders] table and passed this as inputdataset (dataframe) to write.xlsx2 function.

Instead if I do "SELECT * FROM [WideWorldImporters].[Sales].[Orders]", I got error as shown in below screenshot:



This is because data type conversions is getting performed implicitly and some of the data types are not supported when data is passed between R libraries and SQL Server.


To workaround this, we can ignore the fields from exporting that has data types that are
not supported and also doesn't make any sense exporting (for example field with image data type)

To workaround the issue of data types that are poorly converted when data is passed between R and SQL Server, we can make use of TRY_CONVERT(VARCHAR(MAX),field name)

So in below example, I am going to export all the tables data (non temporal tables) that falls under "Sales" schema from WideWorldImporters sample database.


USE [WideWorldImporters]
GO
SET NOCOUNT ON
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList

SELECT CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
,T.name TableName
,DENSE_RANK()OVER(ORDER BY T.name) TableCount
,TY.name DataType
INTO #ExportTablesList
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) = 'Sales'
AND temporal_type = 0 --NON_TEMPORAL_TABLE
AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported

SELECT TableName,MAX(TableCount) TableCount
, STUFF(
(
SELECT ', ' + C.Columns
From #ExportTablesList As C
WHERE C.TableName = T.TableName
FOR XML PATH('')
), 1, 2, '') AS Columns
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName

DECLARE @I INT = 1
,@TableName NVARCHAR(200)
,@SQL NVARCHAR(MAX) = N''
,@RScript NVARCHAR(MAX) = N''



WHILE @I <= (SELECT MAX(TableCount) FROM #FinalExportList)
BEGIN

SELECT @SQL = CONCAT('SELECT TOP 10 ',Columns,' FROM [WideWorldImporters].[Sales].',QUOTENAME(TableName),';')
,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I

SET @RScript = CONCAT('
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava);
library("xlsx");
write.xlsx2(InputDataSet, file = "H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName= "',@TableName,'",
col.names=TRUE, append=TRUE) ')

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@input_data_1 = @SQL


SET @I = @I + 1
END

In above example, I have tried to export into single excel file under multiple sheets with sheet name as table name.  This is achieved by setting argument "append=TRUE" in write.xlsx2 function.

While exporting multiple tables data, if we need any specific configurations to be done say like export top 100 records, order by key field or filter on particular field - all these can be done by tweaking the SQL that is built dynamically (like I have done in above example).




Similarly it can also be exported to different excel files instead of different tabs within single sheet with dynamic file names by tweaking the append & file arguments in write.xlsx2 function. Below script does this:


USE [WideWorldImporters]
GO
SET NOCOUNT ON
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList

SELECT CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
,T.name TableName
,DENSE_RANK()OVER(ORDER BY T.name) TableCount
,TY.name DataType
INTO #ExportTablesList
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) = 'Sales'
AND temporal_type = 0 --NON_TEMPORAL_TABLE
AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported

SELECT TableName,MAX(TableCount) TableCount
, STUFF(
(
SELECT ', ' + C.Columns
From #ExportTablesList As C
WHERE C.TableName = T.TableName
FOR XML PATH('')
), 1, 2, '') AS Columns
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName

DECLARE @I INT = 1
,@TableName NVARCHAR(200)
,@SQL NVARCHAR(MAX) = N''
,@RScript NVARCHAR(MAX) = N''



WHILE @I <= (SELECT MAX(TableCount) FROM #FinalExportList)
BEGIN

SELECT @SQL = CONCAT('SELECT TOP 10 ',Columns,' FROM [WideWorldImporters].[Sales].',QUOTENAME(TableName),';')
,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I

SET @RScript = CONCAT('
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava);
library("xlsx");
write.xlsx2(InputDataSet, file = "H:/Temp/Blog/ImportExportExcel/',@TableName,'.xlsx", sheetName= "',@TableName,'",
col.names=TRUE, append=FALSE) ')

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@input_data_1 = @SQL


SET @I = @I + 1
END






More information on this topic, to be continued in part 3 ...

See Also:  

Reference: 

SQL Server Export/Import excel using R script - Part 3

$
0
0
In continuation to my previous posts on SQL Server Export/Import excel using R script - Part 1& Part 2, where we saw about exporting to Excel. In this post, let us see how to import Excel into SQL Server table using R script. 

In part 2 post, I have exported few tables under "Sales" schema from "WideWorldImporters" sample database into single excel file with six sheets: 

Sheet 1 - CustomerTransactions
Sheet 2 - InvoiceLines
Sheet 3 - Invoices
Sheet 4 - OrderLines
Sheet 5 - Orders
Sheet 6 - SpecialDeals

While importing we have two scenarios  1) Excel file structure is defined & known
                                                                 2) Excel file structure is unknown and table has to be created dynamically

Let us see example for both scenarios:




Scenario 1: Excel file structure is defined & known

For this scenario, let us try to import sheet 1 by setting the argument sheetName = 1 in read.xlsx2 function. As we know the file structure, table has been created beforehand.

SET NOCOUNT ON
DROP TABLE IF EXISTS CustomerTransactions_Import

--As we know the Excel file structure beforehand, creating an empty table to import the data from Excel sheet
SELECT NULL AS RowId,* 
INTO CustomerTransactions_Import 
FROM [Sales].[CustomerTransactions]
WHERE 1 = 2

DECLARE @RScript NVARCHAR(MAX) =N' 
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
    require(rJava);
require("xlsx");
    OutputDataSet <- read.xlsx2("H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName = 1,header=TRUE)
'
--Inserting from Excel
INSERT dbo.CustomerTransactions_Import
EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript

--SELECT * FROM CustomerTransactions_Import

--Below query compares the data in source table and data imported from Excel
SELECT Src.CustomerTransactionID,Imp.CustomerTransactionID 
FROM [Sales].[CustomerTransactions] Src
LEFT JOIN CustomerTransactions_Import Imp
ON Src.CustomerTransactionID = Imp.CustomerTransactionID






Scenario 2: Excel file structure is unknown and table has to be created dynamically

For this scenario, let us try to import sheet 2 by setting the argument sheetName = 2 in read.xlsx2 function. As we don't know the file structure, table will be built dynamically using RevoScaleR function - rxDataStep and imported with data from read.xlsx2 function.

DECLARE @RScript NVARCHAR(MAX) =N' 
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
    require(rJava);
require("xlsx");
    Output <- read.xlsx2("H:/Temp/Blog/ImportExportExcel/Sales.xlsx", sheetName = 2,header=TRUE)

sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=WideWorldImporters;Uid=sa;Pwd=***"
    sqlTable <- "InvoiceLines_Import"
if (rxSqlServerTableExists(sqlTable,  connectionString = sqlConnString))  rxSqlServerDropTable(sqlTable,  connectionString = sqlConnString)
    sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
    rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)'

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript






sqlConnString variable needs to be set with ServerName, DatabaseName, Username & Password for ODBC SQL Server connection.
sqlTable variable is set with the name of table to be created.


 More information on this topic, to be continued in part 4 ...


See Also:

SQL Server 2019 Installation

$
0
0
In this post, let us see how to install SQL Server 2019 CTP 2.2 Evaluation Edition on Windows 10 (64 bit) machine.



Download SQL Server 2019 for windows from link provided in the bottom of this post:




After downloading, right click on the Setup file to Run as Administrator


Select the installation type, I did a Custom installation so that I can select the features to be installed:



Click on Hardware and Software requirements. MSDN site will open, check whether the basic requirements of your machine (like OS, disk space) is sufficient for installation.


 Click on Installation - > New SQL Server stand-alone installation:




We can ignore the warnings: 


Except Polybase Query Service, I have selected all the features.
As Polybase Query Service requires Oracle JRE which is not installed on my machine.


Provide the Instance name if there are any other version of SQL Server already installed on same machine or give the default instance.



Select the authentication type and Add Current User as SQL Server administrators:



SSIS Scale out configuration:



Accept the consent to install R & Python:



After clicking on Install, it took around 30 mins.



Some of the tools & services are not installed as part of above installation.
Open the Set up file -> Installation (select the tool or service to be installed)
(or) we can download it from web & install it separately.

I tried to install SSMS tool (by clicking on options shown in below screenshot):
MSDN site opened in IE with the download link.





What's new in SQL Server 2019

SQL Server 2019 can be downloaded from here


Merry christmas!!

SQL Server Export/Import excel using R script - Part 4

$
0
0
In continuation to my previous posts on SQL Server Export/Import excel using R script - Part 1 ,Part 2 & Part3 where we saw about exporting to Excel and importing to SQL Server table using "xlsx" package which is a powerful package but it has external dependency on Java ("rJava" package).

In this post, let us see how to import excel with multiple sheets using "readxl" package into different SQL Server tables.


I have done below example using SQL Server 2019 on windows 10 (64 bit) and R (3.4.4).
Refer my previous post on how to install R services

I have also used Wideworldimporters sample database for data setup in this example. 
Before trying to import excel with multiple sheets using "readxl" package into SQL Server tables, I tried to re-create the Excel file that was created using script provided in part 2

I got below error :








Error in .jnew("java/io/FileOutputStream", jFile) : 

  java.io.FileNotFoundException:  (Access is denied)


This is because of Java language extension support in SQL Server 2019, So I had to provide folder access (read, write) to "ALL_APPLICATION_PACKAGES".




Step 1:

Now let us install the required packages for this example:

Open R.exe from below path:
C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin

From R.exe, execute install.packages("readxl") 

Step 2:

After successful installation of above package, Copy & paste all the packages downloaded from above step into default R library  to SQL Server R_services library.

From C:\Users\Sathy\Documents\R\win-library\3.4
to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library

Step 3:

Below script will import Excel file with six sheets into six different table on database called "Staging" that I have created before executing this script

Sheet 1 - CustomerTransactions
Sheet 2 - InvoiceLines
Sheet 3 - Invoices
Sheet 4 - OrderLines
Sheet 5 - Orders
Sheet 6 - SpecialDeals



DECLARE @RScript NVARCHAR(MAX) =N' 
    library(readxl);
library(foreach);
    filepath_var <- "H:/Temp/Blog/ImportExportExcel/Sales.xlsx"
    tab_names <- excel_sheets(path = filepath_var)
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

foreach(n = 1:length(tab_names)) %do%
{
sqlTable <- tab_names[n]
Output <- read_excel(path = filepath_var, sheet = n)
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
}'

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript



I have updated above script to make sure script doesn't error out if the file does not exist in the path or if any of the sheet is empty. Below is the updated script:

To test the script I have opened the "Sales.xlsx" file and added an empty sheet and saved it.



DECLARE @RScript NVARCHAR(MAX) =N' 
    library(readxl);
    library(foreach);
    
filepath_var <- "H:/Temp/Blog/ImportExportExcel/Sales.xlsx"
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

#Check file exists in path
if (file.exists(filepath_var)) {

    tab_names <- excel_sheets(path = filepath_var)
    
   foreach(n = 1:length(tab_names)) %do%
   {
  sqlTable <- tab_names[n]
  Output <- read_excel(path = filepath_var, sheet = n)
   
   #check  if the sheet has header 
   if (dim(Output)[2] != 0) {

  sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
  rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
  }
  }
  }'

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript




dim(Output)[1] -> gives us no.of rows
dim(Output)[2] -> gives us no.of columns

In this example, if there are no columns I am assuming the sheet is empty and not doing any further processing.


More information on this topic, to be continued in part 5 ...

See Also:

SQL Server Export/Import excel using R script - Part 5

$
0
0
In continuation to my previous posts on SQL Server Export/Import excel using R script Part 1, Part 2, Part 3& Part 4. In this post, let us see how to import multiple excel files in a folder (including sub folders) into different SQL Server table created dynamically based on excel file structure.




For this example, I am going to make use of excel files generated using second script in Part 2 post.
Also I have created sub folder named "Copy" and copied the same set of files into it, I have done this to ensure script has traversed through all the folders and works fine.

Also to test if the script can ignore the empty sheets, I have manually opened 1 or 2 excel files added an empty sheet and saved it.

In this example, SQL Server table name is made as file name + sheet name and removed all special characters and spaces if any from file/sheet name to make sure SQL Server table name is unique and as per MSSQL standard.





As you can see in above screenshot, table names with Copy as prefix are imported from sub folders.

Below is the script that iterates through multiple files with multiple sheets and create them as separate table in SQL Server:



DECLARE @RScript NVARCHAR(MAX) =N' 
    library(readxl);
    library(foreach);
    sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

    list_filespath <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.xlsx", all.files = FALSE,
           full.names = TRUE, recursive = TRUE,
           ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)

    list_filenames <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.xlsx", all.files = FALSE,
                        full.names = FALSE, recursive = TRUE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)

#Check file exists in path
    if (length(list_filespath) != 0) {

 foreach(m = 1:length(list_filespath)) %do%
 {
    filepath_var <- list_filespath[m]
    tab_names <- excel_sheets(path = filepath_var)

 foreach(n = 1:length(tab_names)) %do%
 {
  sqlTable <- paste(strsplit(list_filenames[m],split = ".xlsx"),tab_names[n],sep = "_") 
  sqlTable <- gsub("\\s", "", sqlTable)
  sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)
  Output <- read_excel(path = filepath_var, sheet = n)

   #check  if the sheet has header 
   if (dim(Output)[2] != 0) {

  sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
  rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
 }
 }
 }
 }'

EXEC   sp_execute_external_script
      @language = N'R'

     ,@script = @RScript



More information on this topic, to be continued in part 6 ..


See Also:

SQL Server Export/Import excel using R script - Part 6

$
0
0
In continuation to my previous posts on SQL Server Export/Import excel using R script Part 1, Part 2, Part 3, Part 4& Part 5. In this post, let us see how to invoke sql files in a folder and export the output of the query as Excel files.

As a prerequisite, we need R packages "xlsx"  & "rJava", which I had showed in my previous posts on how to install and load them.

As we are going to make use of the powershell SQL Server cmdlets "Invoke-Sqlcmd", let us make sure SQL Server powershell modules are downloaded & installed from Powershell.

For this example, I have created below folder structure:







Temp folder - Is a placeholder for staging files created in the intermediate process
Script folder - Has the R script that runs this entire process
BusinessReports folder - Where excel reports are exported
SQL folder -  Contains the SQL files to be executed

For this example, I have placed three .sql files in a folder and the sql inside the file are as shown in below screenshot. All three sql queries are selecting data from three different tables in "WideWorldImporters" sample database.







Explanation on how this approach works:

R invokes the powershell command using "system2" function. Powershell command then reads all the sql files (using Invoke-Sqlcmd) and writes the sql result to a text file (just as a staging). Only one staging file will be created for entire process of exporting SQL files output to Excel and will be deleted at the end of the process. Using the data in staging text file on each iteration, an excel is created using "xlsx" r package.

Below R script does everything what I have mentioned in above explanation:





I have saved the above script in Script folder, it can be either manually opened & executed from any R environment (R studio etc.) or this script can be invoked from command line.



Below screenshot shows on how to invoke R script from command Prompt by using Rscript.exe in SQL Server folder:

Goto Run -> cmd -> 



This command line can also be created as a SQL agent job and scheduled as per requirement.

CD "C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin"
Rscript.exe "H:\Temp\Blog\ImportExportExcel\Script\SQLExportExcel.R"

After executing the R script from Command Prompt or R studio:
We can see the output of SQL files in Excel format within a directory created with date and Excel file & sheet name is SQL file name




Executing the R script on same day, just overwrites the directory & excel files without error and a separate directory is created when the script is executed on the next day.

I tested this scenario with the above script, where if one of the SQL file has incorrect syntax, script still works for all the other SQL files and runs the entire process. 

Using password argument in write.xlsx2 function, the exported excel files can be password protected. But unfortunately writing of password protected workbooks is supported for Excel 2007 OOXML format only.

Below is the complete code block:

library(foreach);
Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk-11.0.1/")
library(rJava);
library("xlsx");
#getting list of sql files & file names from folder
list_filespath <- list.files(path = "H:/Temp/Blog/ImportExportExcel/SQL", pattern = ".sql", all.files = FALSE,
                             full.names = TRUE, recursive = FALSE,
                             ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
list_filenames <- list.files(path = "H:/Temp/Blog/ImportExportExcel/SQL", pattern = ".sql", all.files = FALSE,
                             full.names = FALSE, recursive = FALSE,
                             ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
#Check file exists in path
if (length(list_filespath) != 0) {
  #creating new directory on each day for SQL output 
  newdir <- paste("H:/Temp/Blog/ImportExportExcel/BusinessReports/Report", Sys.Date(), sep = "_")
  dir.create(newdir)
  foreach(m = 1:length(list_filespath)) %do%
  {
  #forming the powershell command to invoke sql files  
  command1 <- "Invoke-Sqlcmd -ServerInstance 'LAPTOP-LCUB6HBB' -Database WideWorldImporters  -InputFile "
  command3 <- "| Export-Csv 'H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt'"
  command <- paste(command1,list_filespath[m],command3, sep="")
  system2("powershell", args = command)
  #powershell Invoke-sqlcmd output to staging file in .txt format
  StagingOut <- read.table('H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt', skip = 1, header =TRUE, sep =',')
  #create excel output using .txt staging file
  ExcelFilename <- paste(newdir,"/",gsub(".sql", "", list_filenames[m]),".xlsx", sep="")
  write.xlsx2(StagingOut, file = ExcelFilename, sheetName=list_filenames[m],
              col.names=TRUE,  append=FALSE, password= "Password123")
  }
}
#Delete staging file (intermediate file before exporting to excel) if it exist
if (file.exists('H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt')) 
{
  file.remove('H:/Temp/Blog/ImportExportExcel/Temp/Stagingfile.txt')
}



See Also:

SQL Server Export/Import excel using R script


Happy new year 2019 !!

SQL Server Export/Import excel using R script

$
0
0
In this post, let us see new approach to import excel into SQL Server and export SQL server data to excel. There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.

BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly
BCP - Work around has to be done to include the header  
SSIS - With dynamic source & destination, handling mapping increases the complexity of the package
SQLCMD - Cannot export output in Excel

R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database and Java language extension support in SQL Server 2019, this new approach can be used extensively as it easy, fast and flexible.




Below series of posts has some examples on how to install various R packages (xlsx - read/write excel & readxl - used for reading excel). 

Also I have showed examples on: 

i) how to export SQL Server data into excel file with multiple tabs with table names as sheet names

ii) how to export SQL Server data into multiple excel files with table name as file & sheet names

iii) how to import excel with multiple sheets into different SQL Server table by creating target schema dynamically with sheet name as table name

iv) how to import multiple excel files in a directory (including sub folders) with multiple sheets into different SQL Server table by creating target schema dynamically with file name + sheet name as table name

v) how to export output of SQL files stored in a folder to excel files


SQL Server Export/Import excel using R script - Part 1
SQL Server Export/Import excel using R script - Part 2
SQL Server Export/Import excel using R script - Part 3
SQL Server Export/Import excel using R script - Part 4
SQL Server Export/Import excel using R script - Part 5
SQL Server Export/Import excel using R script - Part 6

Above examples are done using the base R packages & functions and very few are explicitly installed to achieve some excel operations.

With respect to moving data (Import / Export) between SQL Server & Excel, there are various scenarios based on each requirement. I have covered some of them but tweaking the solutions mentioned above can cover any scenario.

I'll keep updating this article with different examples. If anyone has specific requirement or any issues trying above examples, please comment.

See Also:

SQL Server export adhoc SQL or table to CSV file

SQL Server export adhoc SQL or table to CSV file

$
0
0
In this post, let us see how to export adhoc sql query output or list of tables, views data into csv file. I am going to make use of R script to do this.

I have tried this example with SQL Server 2019 & WideWorldImporters sample database.

For this approach, I have created a scalar function and Stored procedure.

Scalar function - returns a unique table name for a session. This function is required only if adhoc sql query has to be exported to csv file.

Stored procedure - Has below three input parameters and writes the output to csv file
i)   @ExportPath - path for exporting csv file
ii)  @ObjectlisttoExport - list of tables, views to be exported
iii) @Objectlistdelim  - delimiter for above object list



This script will work from SQL Server 2017 and above. Actually it can be made to work from SQL Server 2016 & above -  by modifying STRING_AGG that has been used inside the stored procedure to combine the list of columns into comma separated string. This approach will not work below SQL Server 2016 version as execution of R language using T-SQL was introduced in SQL Server 2016.


Scalar function:

USE [WideWorldImporters]
GO

CREATE OR ALTER FUNCTION fngetcsvtblname()
RETURNS NVARCHAR(128)
AS
BEGIN
DECLARE @tblname NVARCHAR(128)
SELECT @tblname = CONCAT('ExportCSV_',@@SPID,DB_ID(),'_temptbl')
RETURN @tblname
END;
GO





Stored procedure:




CREATE OR ALTER PROC usp_ExportCSV (@ExportPath NVARCHAR(MAX),
@ObjectlisttoExport NVARCHAR(MAX),
@Objectlistdelim CHAR(1)
)
AS
BEGIN
SET NOCOUNT ON;

IF ISNULL(@ExportPath,'') <> ''
BEGIN

SELECT @ExportPath = REPLACE(@ExportPath,'\','/')

DECLARE @tblname NVARCHAR(128)
SELECT @tblname= dbo.fngetcsvtblname()

IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname) OR (ISNULL(@ObjectlisttoExport,'') <> '' AND ISNULL(@Objectlistdelim,'') <> '')
BEGIN


DECLARE @TableColList TABLE (Cols NVARCHAR(MAX),Tbl NVARCHAR(128))

IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname)
BEGIN
INSERT @TableColList
SELECT CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
,CONCAT(S.name,'.',O.name) TableName
FROM sys.objects O
JOIN sys.schemas S
ON S.schema_id = O.schema_id
JOIN sys.columns C
ON O.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE CONCAT(S.name,'.',O.name) = (SELECT CONCAT(schema_name(schema_id),'.',name ) FROM sys.tables WHERE name = @tblname)
AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported
AND O.type IN ('U','V')
END
IF ISNULL(@ObjectlisttoExport,'') <> ''
BEGIN
INSERT @TableColList
SELECT CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME)) Columns -- To cover poor data type conversions
,CONCAT(S.name,'.',O.name) TableName
FROM sys.objects O
JOIN sys.schemas S
ON S.schema_id = O.schema_id
JOIN sys.columns C
ON O.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE CONCAT(S.name,'.',O.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, @Objectlistdelim))
AND TY.name <> 'image' -- Ignore the datatypes that are not required to be exported
AND O.type IN ('U','V')
END

DROP TABLE IF EXISTS #ExportTablesList
SELECT IDENTITY(INT,1,1) Rn
,tbl
,STRING_AGG(cols,',') AS cols
INTO #ExportTablesList
FROM @TableColList
GROUP BY tbl

DECLARE @I INT = 1
,@SQL NVARCHAR(MAX) = N''
,@RScript NVARCHAR(MAX) = N''
,@filename NVARCHAR(MAX) = N''

WHILE @I <= (SELECT MAX(Rn) FROM #ExportTablesList)
BEGIN
--just for testing selecting top 10, this can be removed
SELECT @SQL = CONCAT('SELECT TOP 10',Cols,' FROM ',tbl,';')
,@tblname = Tbl
FROM #ExportTablesList WHERE Rn = @I


SELECT @tblname = REPLACE(@tblname,'.','_')
SELECT @filename = CASE WHEN SUBSTRING (@ExportPath,LEN(@ExportPath),1) = '/' THEN CONCAT(@ExportPath,@tblname,'.csv')
ELSE CONCAT(@ExportPath,'/',@tblname,'.csv') END

SET @RScript = CONCAT('write.csv(InputDataSet, file = "',@filename,'")')

EXEC sp_execute_external_script
@language = N'R'
,@script = @RScript
,@input_data_1 = @SQL

SET @tblname = ''
SET @filename = ''
SET @I = @I + 1
END

--After successful processing, dropping the table created for exporting adhoc sql into csv
SELECT @tblname= dbo.fngetcsvtblname()
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname)
BEGIN
EXEC('DROP TABLE '+@tblname)
END

END ELSE PRINT 'No object specified for exporting to CSV or Objectlist params are empty'

END ELSE PRINT 'Export folder path need to be mentioned'

END;
GO


Let us see the different execution results:

1) When all the parameters are passed blank



2) When folder path is passed but no objects specified for exporting


3) Notice in the below screenshot, I am getting a unique table name using scalar function and inserting my adhoc sql output into that table and then calling the stored procedure.

This stored procedure exports the adhoc sql output to csv file in the mentioned path and drops the table created after successful processing.


4) I am passing the list of tables from Sales schema in WideWorldImporters database.
This stored procedure exports the list of tables data into separate csv files with table name as file name in the mentioned path.


5) This is to show that Stored procedure can export adhoc sql output, list of tables, views data into separate csv files simultaneously.


Sample execution code block:


SELECT dbo.fngetcsvtblname() --function to get unique table name for inserting adhoc sql into a table
DECLARE @tblname NVARCHAR(128) = dbo.fngetcsvtblname()
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = @tblname)
BEGIN
EXEC('DROP TABLE '+@tblname)
END

/** place your adhoc sql to be exported into csv and SELECT INTO tablename provided by above function **/
SET NOCOUNT ON;
SELECT TOP 5 OL.OrderLineID,O.*
INTO ExportCSV_726_temptbl
FROM [WideWorldImporters].[Sales].[Orders] O
JOIN [Sales].[OrderLines] OL
ON O.OrderID = OL.OrderID

/** place your adhoc sql to be exported into csv and SELECT INTO tablename provided by above function **/

EXEC usp_ExportCSV 'H:\Temp\Blog\ImportExportExcel','SAles.Orders|Sales.Invoices|Website.Customers','|'



See Also:

Import csv or text file into SQL Server table

$
0
0
In this post, let us see how to import csv or text file into SQL Server table. If you want to import some random files on adhoc basis then we can make use ofImport Flat File Wizard from SQL Server Management Studio.

If the requirement is to do on regular basis or import multiple files simultaneously then we can do this using t-sql script that executes R script to read an csv or text file & load into SQL Server table.

In my previous post on SQL Server export adhoc SQL or table to CSV file, I have exported few csv files. I am going to make use of those three csv files for this example.




I have also created a database called "Staging" to load the csv files into different SQL tables created dynamically with csv filename as table name.

Below screenshot shows executing the below script imports the csv files as SQL Server tables under staging database.




DECLARE @RScript NVARCHAR(MAX) =N' 
    library(foreach);
    sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

    list_filespath <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.csv", all.files = FALSE,
           full.names = TRUE, recursive = TRUE,
           ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)
    list_filenames <- list.files(path = "H:/Temp/Blog/ImportExportExcel", pattern = "*.csv", all.files = FALSE,
                        full.names = FALSE, recursive = TRUE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = TRUE)

  #Check file exists in path
    if (length(list_filespath) != 0) {

 foreach(m = 1:length(list_filespath)) %do%
 {
   
  sqlTable <- strsplit(list_filenames[m],split = ".csv") 
  sqlTable <- gsub("\\s", "", sqlTable)
  sqlTable <- gsub("[^[:alnum:]]", "", sqlTable)

  Output <- read.csv(file=list_filespath[m], header=TRUE, sep=",")
  sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
  rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)

 }
 }'

EXEC   sp_execute_external_script
      @language = N'R'

     ,@script = @RScript

See Also:

T-SQL : Search for string or phrase in SQL Server database

$
0
0
In this post, let us see how to search for a string / phrase in SQL Server database using hybrid solution of T-SQL LIKE operator & R grep function. Currently the options that exists in SQL Server to perform a search operation are 
  1. LIKE operator 
  2. Using string functions CHARINDEX, PATINDEX 
  3. Full text search 
Consider below example: To search and return only records with string "VAT" . Expected result is to return record 1,5 & 6.

DECLARE @Tmp TABLE (Id INT, Descrip VARCHAR(500))
INSERT @Tmp SELECT 1,'my VAT calculation is incorrect'
INSERT @Tmp SELECT 2,'Private number'
INSERT @Tmp SELECT 3,'Innnovation model'
INSERT @Tmp SELECT 4,'ELEVATE'
INSERT @Tmp SELECT 5,'total VAT'
INSERT @Tmp SELECT 6,'VAT'
SELECT FROM @Tmp WHERE Descrip LIKE 'VAT'
SELECT FROM @Tmp WHERE Descrip LIKE '%VAT'
SELECT FROM @Tmp WHERE Descrip LIKE '%VAT%'
SELECT FROM @Tmp WHERE Descrip LIKE '% VAT %'
SELECT FROM @Tmp WHERE Descrip LIKE '% VAT'






As shown in above example, to do an exact search on string, there is no straight forward option using first two options mentioned above. However though it is possible with third option using  Full text CONTAINS predicate. Full text catalog, unique index & full text index has to be created on the table on which search operation needs to be performed.
If the exact search of string needs to be performed on the entire database then creating full text catalog, unique index & full text index on each and every table won’t be a viable option.

With the hybrid approach [T-SQL LIKE operator & R grep function], let us see various search types that can be performed [Pattern Search, Exact Search, Multi pattern search and other search scenario's  – based on collation, case sensitive/insensitive search and complex wildcard search].


Pre-requisites

We have used SQL Server 2019 evaluation edition  on Windows 10 64 bit and WideWorldImporters  SQL Server sample database for this example. In this example, we have made use of R services installed as part of SQL Server. Install R services  and then from SSMS enable the external scripting feature. Restart the database engine and then verify the installation as mentioned in MSDN.
Below script / this approach will work starting from SQL Server 2016 and above (as execution of R language using T-SQL was introduced in SQL Server 2016). Also please note, no additional R packages need to be installed for this approach.


Explanation on how this approach works


A stored procedure named "usp_SearchString" has been created. This stored procedure has the capability to do normal T-SQL LIKE operations as well as can search string using R grep function and this can be controlled through input parameter.
Output of the search operation will be stored in a table named "Tbl_SearchString". Also output will be displayed at the end of stored procedure execution.
Below are the various input parameters of stored procedure and it’s usage details:
ParameterDescription
@SearchStringString to be searched
@SearchTypeES - Exact Search using R
PS - Pattern Search using R
 MPS - Multi Pattern Search (OR condition) using R
 NTLS - Normal T-SQL Like Search
@Match0 = LIKE Search,
1 = NOT LIKE Search
@IgnoreCase1 = case insensitive search,
0 = Case sensitive search
(If @IgnoreCase IS NULL then default : case insensitive search)
@SearchSQLMetadataSearch sql definitions for presence of input string.
1 = Search,
0 = Don't Search
@SchemaNameList of objects to be searched that fall under schema (Multiple schema's can be passed, separated by Comma)
@ObjectlisttoSearch List of objects to be searched (Multiple tables can be passed, separated by Comma)
@SearchCollateFor @SearchType = NTLS if @IgnoreCase  = 0.
To search based on particular collation, default - COLLATE Latin1_General_CS_AS

If both @ObjectlisttoSearch & @SchemaName are blank then entire database is searched including SQL object definitions

Please note:
@SearchType  = ES, PS, MPS uses R script to search string
@SearchType  = NTLS does the normal T-SQL LIKE search

Multi string search should always be delimited by pipe. 
@ObjectlisttoSearch, @SchemaName should always be delimited by comma if multiple values specified.

Script

USE [WideWorldImporters]
GO
  
--Note : Before compiling this SP, search for sqlConnString and provide Databasename, username & password for R SQL connection
CREATE OR ALTER PROC usp_SearchString (  @SearchString NVARCHAR(MAX),
 @SearchType VARCHAR(4),
 @Match BIT,
 @IgnoreCase BIT,
 @SearchSQLMetadata CHAR(1),
 @SchemaName NVARCHAR(50),
 @ObjectlisttoSearch NVARCHAR(MAX),
 @SearchCollate NVARCHAR(500)
 )
  
/*************************************************************************
=================
INPUT PARAMETERS:
=================
  
@SearchString - String to be searched
  
@SearchType  - ES - Exact Search using R
 PS - Pattern Search using R
 MPS - Multi Pattern Search - OR condition using R
 NTLS - Normal T-SQL Like Search
  
@Match - 0 = LIKE Search, 1 = NOT LIKE Search
  
@IgnoreCase - 1 = case insensitive search, 0 = Case sensitive search (If @IgnoreCase IS NULL then default : case insensitive search)
  
@SearchSQLMetadata - Search sql definitions for presence of input string. 1 = Search, 0 = Don't Search
  
@SchemaName  - List of objects to be searched that fall under schema (Multiple schema's can be passed, separated by Comma)
  
@ObjectlisttoSearch  - List of objects to be searched (Multiple table's can be passed, separated by Comma)
  
--IF BOTH @ObjectlisttoSearch & @SchemaName ARE BLANK THEN ENTIRE DATABASE IS SEARCHED INCLUDING SQL DEFINITIONS
  
@SearchCollate - For @SearchType = NTLS if @IgnoreCase  = 0. To search based on particular collation, default - COLLATE Latin1_General_CS_AS
*****************************************************************************/
  
AS
BEGIN
SET NOCOUNT ON;
  
IF @SearchType IN ('ES','PS','MPS','NTLS')
  
 BEGIN
  
 DECLARE @ExecutedBy NVARCHAR(200) = CURRENT_USER
 DECLARE @Serv NVARCHAR(200) = CONCAT(CHAR(39),CHAR(39),@@SERVERNAME,CHAR(39),CHAR(39))
  
  
IF ISNULL(@SchemaName,'') <> ''  OR ISNULL(@ObjectlisttoSearch,'') <> ''
 BEGIN
  
/**** List of table columns to be searched  ****/
DECLARE @TableColList TABLE (Cols NVARCHAR(MAX),colname NVARCHAR(200),Tbl NVARCHAR(128),TblCol NVARCHAR(100),ColType NVARCHAR(150))
INSERT @TableColList
SELECT
 CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')
 THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
 ELSE C.name END Columns  -- To cover poor data type conversions when passed to R dataframe
 ,C.name
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name,'.',C.name) TblCol
 ,TY.name
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
-- Ignore the datatypes that are not required
WHERE TY.name NOT IN ('geography','varbinary','binary','text''ntext''image''hierarchyid''xml''sql_variant')
AND (Schema_name(T.schema_id) IN (SELECT value FROM STRING_SPLIT(@SchemaName, ','))
OR CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoSearch, ',')))
  
 END ELSE
  
 BEGIN
  
 INSERT @TableColList
SELECT
 CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')
 THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
 ELSE C.name END Columns  -- To cover poor data type conversions when passed to R dataframe
 ,C.name
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name,'.',C.name) TblCol
 ,TY.name
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
-- Ignore the datatypes that are not required
WHERE TY.name NOT IN ('geography','varbinary','binary','text''ntext''image''hierarchyid''xml''sql_variant')
  
 END
  
DROP TABLE IF EXISTS #ExportTablesList
CREATE TABLE #ExportTablesList (Rn BIGINT IDENTITY(1,1),cols NVARCHAR(500),colname NVARCHAR(200),tbl NVARCHAR(200),ColType NVARCHAR(200))
  
IF @SearchSQLMetadata = 1 OR (@SearchSQLMetadata <> 0 AND (ISNULL(@SchemaName,'') = ''  AND ISNULL(@ObjectlisttoSearch,'') = ''))
 BEGIN
  
 INSERT #ExportTablesList (cols,tbl,ColType) SELECT'CONCAT(''<'',object_schema_name(sm.object_id),''.'',object_name(sm.object_id),''|'',o.type_desc COLLATE Latin1_General_100_CI_AS,''>'',sm.definition) AS definition'
 ,'sys.sql_modules AS sm  JOIN sys.objects AS o ON sm.object_id = o.object_id'
 ,'sql_modules'
  
 END
  
--Deduplication of object list
;WITH dedup
AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY Tbl,Cols ORDER BY Cols) Rn FROM @TableColList
)
INSERT INTO #ExportTablesList
SELECT cols,colname,tbl,ColType FROM dedup
WHERE Rn = 1
AND tbl <> 'dbo.Tbl_SearchString'
  
 /**** List of table columns to be searched  ****/
  
 IF (SELECT COUNT(1) FROM #ExportTablesList) <> 0
  
 BEGIN
  
 --Table to hold search output
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Tbl_SearchString')
BEGIN
CREATE TABLE [dbo].[Tbl_SearchString] (
[RunId] FLOAT,
[SearchIndex] BIGINT,
[SearchValue] NVARCHAR(MAX),
[NoOfOccurance] FLOAT,
[ObjectName] NVARCHAR(200),
[ColumnNameORDefinition] NVARCHAR(200),
[SqlDatatype] NVARCHAR(200),
[InputParameter] NVARCHAR(800),
[ExecutedBy] NVARCHAR(200),
[ExecutedAt] DATETIME
)
  
END
  
DECLARE @RunId FLOAT
SELECT @RunId = COALESCE(MAX([RunId]),0)+1 FROM [dbo].[Tbl_SearchString]
  
--Processing to store input parameters
DECLARE @Input NVARCHAR(MAX) = CONCAT(
 '@SearchString > ' ,CASE WHEN @SearchString = '' OR @SearchString IS NULL THEN 'NULL' ELSE @SearchString END
,',@SearchType > '  ,CASE WHEN @SearchType = '' OR @SearchType IS NULL THEN 'NULL' ELSE @SearchType END
,',@Match > ' ,COALESCE(@Match,0)
,',@IgnoreCase > '  ,COALESCE(@IgnoreCase,1)
,',@SearchSQLMetadata > '  ,CASE WHEN @SearchSQLMetadata = '' OR @SearchSQLMetadata IS NULL THEN 'NULL' ELSE @SearchSQLMetadata END
,',@SchemaName > '  ,CASE WHEN @SchemaName = '' OR @SchemaName IS NULL THEN 'NULL' ELSE @SchemaName END
,',@ObjectlisttoSearch > '  ,CASE WHEN @ObjectlisttoSearch = '' OR @ObjectlisttoSearch IS NULL THEN 'NULL' ELSE @ObjectlisttoSearch END)
  
--By default case insensitive search
SELECT @IgnoreCase = COALESCE(@IgnoreCase,1)
  
--By default LIKE search
SELECT @Match = COALESCE(@Match,0)
  
IF @SearchType = 'NTLS'
  
BEGIN
  
DECLARE @SearchStrings TABLE (Id INT IDENTITY(1,1),String NVARCHAR(MAX))
  
INSERT @SearchStrings
SELECT value FROM STRING_SPLIT(@SearchString, '|')
  
 UPDATE #ExportTablesList SET Tbl = 'sys.sql_modules', colname = 'definition'
 WHERE ColType = 'sql_modules'
  
 SET @SearchCollate = CASE WHEN @SearchCollate = '' THEN NULL ELSE @SearchCollate END
  
 DECLARE @COLLATE NVARCHAR(100)
 SET @COLLATE = CASE WHEN @IgnoreCase = 0 THEN CASE WHEN @SearchCollate = '' OR @SearchCollate IS NULL THEN ' COLLATE Latin1_General_CS_AS '
 ELSE CONCAT(' COLLATE ',@SearchCollate,'') END
 ELSE CHAR(32) END
  
 DECLARE @SearchOperator NVARCHAR(100)
 SET @SearchOperator = CASE WHEN @Match = 1 THEN ' NOT LIKE ' ELSE ' LIKE ' END
  
 DECLARE @WHEREClause NVARCHAR(MAX)
  
;WITH CTE
AS
(
SELECT  'SearchValue '+ @SearchOperator +''''+String+''''+@COLLATE WhereClause  FROM @SearchStrings
)
  
SELECT @WHEREClause = STUFF(
(SELECT  ' OR ' + WhereClause FROM
(SELECT WhereClause FROM CTE ) AS T FOR XML PATH('')),2,2,'')
  
END
  
SET @SearchString  = CASE WHEN @SearchType = 'ES' THEN REPLACE(@SearchString,'"','') ELSE @SearchString END
  
 /**** Loop through above Objects list and execute R script ****/
 DECLARE @I INT = 1
 ,@SQL NVARCHAR(MAX) = N''
 ,@RScript NVARCHAR(MAX) = N''
 ,@tblname NVARCHAR(128)
 ,@Colname NVARCHAR(200)
 ,@Sqltype NVARCHAR(100)
  
 WHILE @I <= (SELECT MAX(Rn) FROM #ExportTablesList)
 BEGIN
  
 SELECT @SQL = CONCAT('SELECT ',Cols,' FROM ',tbl)
 ,@tblname = Tbl
 ,@Colname = CASE WHEN @SearchType IN ('ES','PS') THEN cols ELSE colname END
 ,@Sqltype = ColType
 FROM #ExportTablesList WHERE Rn = @I
  
IF @SearchType IN ('ES','PS','MPS')
  
BEGIN
  
SET @RScript = '
 #Provide DB credential detail for storing output in a table
 sqlConnString <- "Driver=SQL Server;Server=serv; Database=WideWorldImporters;Uid=sa;Pwd=password"
  
#function to count no of occurences
 countCharOccurrences <- function(char,string,Type) {
 if (Type =="ES")
 {
 Boundchar <- paste0("\\b",char,"\\b",sep ="")
 string1 <- gsub(Boundchar,"",string,ignore.case=IgnoreCase)
 }
 string1 <- gsub(char,"",string,ignore.case=IgnoreCase)
 return ((nchar(string) - nchar(string1))/nchar(char))
 }
  
 #getting input dataset column name into a variable "c"
 c <- colnames(InputDataSet)
  
 if (SearchType == "ES")
 {
 ExactString <- paste0("\\b",SearchString,"\\b",sep ="")
 Output <-  as.data.frame(grep(ExactString,InputDataSet[[c]],ignore.case = IgnoreCase,invert = Match))
 colnames(Output)[1] <- "SearchIndex"
 Output$SearchValue <- grep(ExactString,InputDataSet[[c]],ignore.case = IgnoreCase,value = TRUE,invert = Match)
 Output$NoOfOccurance <- countCharOccurrences(SearchString,Output$SearchValue,SearchType)
 }
  
 if (SearchType == "PS" || SearchType == "MPS")
 {
 Output <-  as.data.frame(grep(SearchString,InputDataSet[[c]],ignore.case = IgnoreCase,invert = Match))
 colnames(Output)[1] <- "SearchIndex"
 Output$SearchValue <- grep(SearchString,InputDataSet[[c]],ignore.case = IgnoreCase,value = TRUE,invert = Match)
 if (SearchType == "PS") {
 Output$NoOfOccurance <- countCharOccurrences(SearchString,Output$SearchValue,SearchType) }
 }
  
 Output$ObjectName <- rep(tblname,nrow(Output))
 Output$ColumnNameORDefinition <- rep(c,nrow(Output))
 Output$SqlDatatype <- rep(Sqltype,nrow(Output))
 Output$ObjectName[Output$SqlDatatype == "sql_modules"] <- "sql_modules"
 Output$InputParameter <- rep(Input,nrow(Output))
 Output$ExecutedBy <- rep(ExecutedBy,nrow(Output))
 Output$ExecutedAt <- rep(format(Sys.time(),usetz = FALSE),nrow(Output))
 Output$RunId <- rep(RunId,nrow(Output))
  
 sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = "Tbl_SearchString")
 rxDataStep(inData = Output, outFile = sqlDS,append = "rows")
 '
  
EXEC  sp_execute_external_script
 @language = N'R'
 ,@script = @RScript
 ,@input_data_1 = @SQL
 ,@params = N'@SearchString NVARCHAR(MAX),@SearchType VARCHAR(4),@Match BIT,@IgnoreCase BIT,@Input NVARCHAR(MAX)
 ,@tblname NVARCHAR(128),@Sqltype NVARCHAR(150),@ExecutedBy NVARCHAR(200),@RunId FLOAT
 ,@Serv NVARCHAR(200)'
 ,@SearchString = @SearchString
 ,@SearchType = @SearchType
 ,@Match = @Match
 ,@IgnoreCase = @IgnoreCase
 ,@Input = @Input
 ,@tblname = @tblname
 ,@Sqltype = @Sqltype
 ,@ExecutedBy = @ExecutedBy
 ,@RunId = @RunId
 ,@Serv = @Serv
  
END
  
IF @SearchType = 'NTLS'
  
BEGIN
  
INSERT [dbo].[Tbl_SearchString]([RunId],[SearchIndex],[SearchValue],[ObjectName]
 ,[ColumnNameORDefinition],[SqlDatatype],[InputParameter],[ExecutedBy],[ExecutedAt])
EXEC ('SELECT '+@RunId+',SearchIndex,SearchValue,'''+@tblname+''','''+@Colname+''','''+@Sqltype+''','''+@Input+''','''+@ExecutedBy+''', GETDATE()
FROM (SELECT ROW_NUMBER()OVER(ORDER BY (SELECT 1)) SearchIndex,'+@Colname+' AS SearchValue FROM '+@tblname+
' ) Tmp WHERE '+@WHEREClause)
  
END
  
SET @I = @I + 1
END
  
 /**** Loop through above table list and execute R script ****/
  
 --Display final search result
 SELECT * FROM [dbo].[Tbl_SearchString] WHERE RunId = @RunId AND ExecutedBy = CURRENT_USER
  
  
 END
  
 ELSE
  
 SELECT 'No valid objects passed in the InputParameter to search the string' AS InvalidParameter
  
 END
  
 ELSE
  
 SELECT 'SearchType parameter is mandatory ES - Exact Search, PS - Pattern Search,MPS - Multi Pattern Search - OR condition
 ,NTLS - Normal T-SQL Like Search' AS InvalidParameter
  
 END


Sample execution code block:

EXEC usp_SearchString         @SearchString = 'VAT'
            ,@SearchType = 'ES'
  ,@Match = 0  -- 0 = LIKE, 1 = NOT LIKE
  ,@IgnoreCase = 1  -- 1 = Case insensitive, 0 = Case Sensitive
  ,@SearchSQLMetadata= 0 -- 1 = Search, 0 = Don't Search
  ,@SchemaName = ''
  ,@ObjectlisttoSearch = 'dbo.Tmp'
  ,@SearchCollate = ''


Sample script execution screenshots

1. Search string in table from DB 

Example 1: If we want to search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :




2. Exact search in table from DB

In example 1, we did a pattern search. If we want to do a exact search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :



3. Exact case sensitive search

Example 3: In example 2, we did a exact search. If we want to do a exact case sensitive search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :



4. Exact case sensitive search (NTLS)

If we want to do a exact case sensitive search for a string "male" on Purchasing.PurchaseOrderLines table from WideWorldImporters database, we can try by setting parameters values as shown below :



5. Exact case sensitive search (Using R)

In example 4, it returned two rows (records that contains both "male" and "female") as the @SearchType was set NTLS (Normal T-SQL Like search). But we actually expect the result to be one row with record that contain only "male". If we again do a exact case sensitive search for a string "male" on Purchasing.PurchaseOrderLines table from WideWorldImporters database, by setting @SearchType =  'ES' shown below :




6. Multi string search delimited by pipe

If we want to do a multi string search delimited by pipe (search string "Ava","Amy") on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :



Please note NoOfOccurance field will be populated only for @SearchType = "ES"& "PS" (for single string search without wildcard).

7. Multi string search using normal T-SQL LIKE search

In example 6, multi string search was done using R script. If we want to do the same multi string search using normal T-SQL LIKE search, we can try by setting parameters values as shown below :
Also in this example, let us see how we can search string with specific collation setting.



8. Fixed pattern search

If we want to do fixed pattern search say if we know the string to be searched is a two letter alphabet then we can try by setting parameters values as shown below :




9. Search phrase from multiple tables delimited by comma

Example 9: If we want to search for a phrase from multiple tables delimited by comma then we can try by setting parameters values as shown below :





10. Search for a date from multiple schema's delimited by comma

Example 10: If we want to search for a date from multiple schema's delimited by comma then we can try by setting parameters values as shown below :




11. Search string called "Password"  in entire database

Example 11: If we want to search for a string called "Password"  in entire database including SQL object definitions then we can try by setting parameters values as shown below :



12. Wild card search

Example 12: Below example shows how we can do wildcard search when the search is done using R script, refer parameters values as shown below :
To know more about R wild card search using "?","*","^","$", please see the link provided in reference  section.


Tbl_SearchString stores the details about search made on a string. If we want to see the entire details (all the other fields from a table) of the record that matches the searched string then we can try like shown below:



--To get deails of particular RunId
SELECT  DISTINCT RunId
 ,[ObjectName]
FROM [WideWorldImporters].[dbo].[Tbl_SearchString]
WHERE RunId = 12
SELECT  A.*
 ,B.*
FROM [WideWorldImporters].[dbo].[Tbl_SearchString] A
JOIN (SELECT row_number()over(order by (SELECT 'A')) Rn,* FROM Warehouse.StockItemHoldings ) B --Change table name
 ON A.SearchIndex = B.Rn
 AND A.ObjectName = 'Warehouse.StockItemHoldings' --Change table name
 AND RunId = 12 --provide run id



See Also:


T-SQL : Data profiling in On-premise SQL Server / Azure SQL database

$
0
0
In this post, let us see how we can do data profiling on On-premise SQL Server / Azure SQL database tables using T-SQL script.

Data profiling provides below high level information about data:

  • Number of rows and size of the data in the object, date for the most recent update of the data and the object schema
  • Number of null records, distinct values for columns and data type of column
  • Minimum, maximum values for numeric columns and check for existence of special characters, leading/trailing spaces for columns of string data type 
Two stored procedures are created to generate data profiling output. Inside the stored procedure, iteration on each table columns are performed using system catalog views to apply the MIN, MAX and other functions on each column to get the data profiling output



1) usp_DataProfiling

2) usp_DataProfiling_Metadata




Below are the various input parameters of stored procedure's and it’s usage details:



1) usp_DataProfiling


 ParameterDescription 
@Report1 = ColumnDataProfiling
2 = ColumnUniqueValues
@SchemaNameSchema name for which data profiling needs to be done. Multiple values can be passed separated by comma
@ObjectlisttoSearchList of tables separated by comma for which data profiling needs to be done 
@ExcludeTablesTables to be excluded from data profiling output. Input format: Schemaname.tablename. Multiple values can be passed separated by comma
@ExcludeColumnsColumns to be excluded from data profiling output. Multiple values can be passed separated by comma
@ExcludeDataTypeData types to be excluded from data profiling output. Multiple values can be passed separated by comma. These data types are excluded by default 'geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant' as they are not really relevant for data profiling
@RestrictCharlength
Resultset shows only one record describing the details of the field, if the field has maximum character length greater than 100 else resultset will show one record per unique value in that field. This default value can be overwritten using this parameter
@RestrictNoOfUniqueValues
Resultset shows only one record describing the details of the field, if that field has more than 50 unique values else resultset will show one record per unique value in that field. This default value can be overwritten using this parameter




2) usp_DataProfiling_Metadata


 ParameterDescription 
 @Report1 = TableStats
2 = TableColumnMetadata
 @SchemaNameSchema name for which data profiling needs to be done. Multiple values can be passed separated by comma
 @ObjectlisttoSearchList of tables separated by comma for which data profiling needs to be done 


Data Profiling using T-SQL script:


1) usp_DataProfiling


CREATEORALTERPROCEDUREusp_DataProfiling
 @Report TINYINT ,  --1 = 'ColumnDataProfiling', 2 = 'ColumnUniqueValues'
 @SchemaName NVARCHAR(MAX) = N'',
 @ObjectlisttoSearch NVARCHAR(MAX),
 @ExcludeTables NVARCHAR(MAX) = N'',
 @ExcludeColumns NVARCHAR(MAX) = N'',
 @ExcludeDataType NVARCHAR(100) = N'',
 @RestrictCharlength INT,
 @RestrictNoOfUniqueValues INT
AS
 BEGIN
 SETNOCOUNT ON;
 SETANSI_WARNINGS OFF;
 SETANSI_NULLS ON;
 SELECT@RestrictCharlength = IIF(@RestrictCharlength ISNULLOR@RestrictCharlength = '',100,@RestrictCharlength)
 SELECT@RestrictNoOfUniqueValues = IIF(@RestrictNoOfUniqueValues ISNULLOR@RestrictNoOfUniqueValues = '',50,@RestrictNoOfUniqueValues)
     
DECLARE@TableColList TABLE(Id INTIDENTITY(1,1),Tbl NVARCHAR(128),colname NVARCHAR(200),ColType NVARCHAR(150))
 IF ISNULL(@SchemaName,'') <> ''  ORISNULL(@ObjectlisttoSearch,'') <> ''
 BEGIN
    
INSERT@TableColList
SELECT    DISTINCTCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
         ,C.name
         ,CASEWHENTY.is_user_defined = 1 THEN(SELECTnameFROMsys.types
                                                 WHEREsystem_type_id = user_type_id
                                                    AND  system_type_id =  TY.system_type_id)
                                            ELSETY.name
          END
FROMSys.tables T
JOINsys.columns C
    ONT.object_id = C.object_id
JOINsys.types TY
    ONC.[user_type_id] = TY.[user_type_id]
-- Ignore the datatypes that are not required
WHERETY.nameNOTIN('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')
    AND(Schema_name(T.schema_id) IN(SELECTvalue FROMSTRING_SPLIT(@SchemaName, ','))
    ORCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN(SELECTvalue FROMSTRING_SPLIT(@ObjectlisttoSearch, ',')))
    AND(TY.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeDataType, ','))
    ANDTY.name= TY.name)
    AND(C.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeColumns, ','))
    ANDC.name= C.name)
    AND(CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) NOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeTables, ','))
    ANDCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) = CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name))
 ENDELSE
   
 BEGIN
   
 INSERT@TableColList
SELECT    DISTINCTCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
          ,C.name
          ,CASEWHENTY.is_user_defined = 1 THEN(SELECTnameFROMsys.types
                                                 WHEREsystem_type_id = user_type_id
                                                    AND  system_type_id =  TY.system_type_id)
                                            ELSETY.name
          END
FROMSys.tables T
JOINsys.columns C
    ONT.object_id = C.object_id
JOINsys.types TY
    ONC.[user_type_id] = TY.[user_type_id]
-- Ignore the datatypes that are not required
WHERETY.nameNOTIN('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')
    AND(TY.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeDataType, ','))
    ANDTY.name= TY.name)
    AND(C.nameNOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeColumns, ','))
    ANDC.name= C.name)
    AND(CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) NOTIN(SELECTvalue FROMSTRING_SPLIT(@ExcludeTables, ','))
    ANDCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) = CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name))
   
 END
DROPTABLEIF EXISTS #Final
CREATETABLE#Final (Id BIGINTIDENTITY(1,1),TableName NVARCHAR(128),ColumnName NVARCHAR(200),ColumnType NVARCHAR(150),ColumnUniqueValues NVARCHAR(MAX),UniqueValueOccurance BIGINT,MissingDataRowCount BIGINT,MinValue NVARCHAR(MAX),MaxValue NVARCHAR(MAX),SpecialCharacters BIGINT,LeadingTrailingSpaces BIGINT,MinFieldValueLen BIGINT,MaxFieldValueLen BIGINT,Comment NVARCHAR(MAX))
DROPTABLEIF EXISTS  #temp
CREATETABLE#temp(Id BIGINTIDENTITY(1,1),TableName NVARCHAR(128),ColumnName NVARCHAR(200),Cnt BIGINT,MaxLen BIGINT,MinLen BIGINT,MissingDataCount BIGINT,MinValue NVARCHAR(MAX),MaxValue NVARCHAR(MAX),SpecialCharacters BIGINT,LeadingTrailingSpaces BIGINT)
DECLARE@I                        INT= 1
       ,@SQL                      NVARCHAR(MAX) = N''
       ,@tblname                  NVARCHAR(128)
       ,@Colname                  NVARCHAR(200)
       ,@ColType                  NVARCHAR(150)
       ,@Cnt                      BIGINT
       ,@MaxLen                   BIGINT
       ,@MinLen                   BIGINT
       ,@MissingData              BIGINT
       ,@MaxVal                   NVARCHAR(MAX) = N''
       ,@MinVal                   NVARCHAR(MAX) = N''
       ,@MinMAxSQL                NVARCHAR(MAX) = N''
       ,@SpecialCharacters        BIGINT
       ,@LeadingTrailingSpaces    BIGINT
  WHILE @I <= (SELECTMAX(Id) FROM@TableColList)
  BEGIN
  SELECT@Colname = QUOTENAME(colname),@tblname = Tbl,@ColType = ColType  FROM@TableColList
  WHEREId = @I
SELECT@MinMAxSQL = CASEWHEN@ColType IN('date','datetime','datetime2','datetimeoffset','time','timestamp')
                         THENCONCAT(' FORMAT (MIN(',@Colname,'), ''yyyy-MM-dd,hh:mm:ss'') MinValue,FORMAT (MAX(',@Colname,'), ''yyyy-MM-dd,hh:mm:ss'') MAXValue')
                         WHEN@ColType = 'bit'
                         THEN'0 AS MinValue,1 AS MaxValue'
                         ELSECONCAT('CASE WHEN EXISTS (SELECT 1 FROM ',@tblname,' WHERE ISNUMERIC(',@Colname,') = 0)','THEN NULL ELSE MIN(',@Colname,')   END MinValue
                             ,CASE WHEN EXISTS (SELECT 1 FROM ',@tblname,' WHERE ISNUMERIC(',@Colname,') = 0)','THEN NULL ELSE MAX(',@Colname,')   END MAXValue')
                     END
EXEC(';WITH CTE AS (
        SELECT   COUNT_BIG(DISTINCT '+@Colname+') Cnt
                ,MAX(LEN('+@Colname+')) MaxLen
                ,MIN(LEN('+@Colname+')) MinLen
                ,SUM(CASE WHEN '+@Colname+' IS NULL OR CAST('+@Colname+' AS VARCHAR(MAX)) = '''' THEN 1 ELSE 0 END) MissingData
                ,'+@MinMAxSQL+'
                ,CASE WHEN '''+@ColType+''' IN (''nvarchar'',''varchar'',''nchar'',''char'')
                      THEN SUM(CASE WHEN '+@Colname+' LIKE ''%[^a-zA-Z0-9 ]%'' THEN 1 ELSE 0 END)
                      ELSE NULL END SpecialCharacters
                ,CASE WHEN '''+@ColType+''' IN (''nvarchar'',''varchar'',''nchar'',''char'')
                      THEN SUM(CASE WHEN ISNULL(DATALENGTH('+@Colname+'),'''') = ISNULL(DATALENGTH(RTRIM(LTRIM('+@Colname+'))),'''') THEN 0 ELSE 1 END)
                      ELSE NULL END LeadingTrailingSpaces
        FROM '+@tblname+' )
        INSERT #temp(TableName,ColumnName,Cnt,MaxLen,MinLen,MissingDataCount,MinValue,MaxValue,SpecialCharacters,LeadingTrailingSpaces)
        SELECT '''+@tblname+''','''+@Colname+''',Cnt,ISNULL(MaxLen,0) MaxLen,ISNULL(MinLen,0) MinLen,ISNULL(MissingData,0) MissingData,MinValue,MAXValue
        ,ISNULL(SpecialCharacters,0) SpecialCharacters,ISNULL(LeadingTrailingSpaces,0) LeadingTrailingSpaces FROM CTE')
   
  SELECT@Cnt = Cnt,@MaxLen = MaxLen,@MinLen = MinLen,@MissingData = MissingDataCount,@MinVal=MinValue,@MaxVal=MAXValue
        ,@SpecialCharacters = SpecialCharacters  ,@LeadingTrailingSpaces = LeadingTrailingSpaces 
  FROM#temp
  WHEREId = @I ANDTableName = @tblname ANDColumnName = @Colname
  IF ISNULL(@MaxLen,'') < @RestrictCharlength ANDISNULL(@Cnt,'') < @RestrictNoOfUniqueValues  
      BEGIN
       
      SET@SQL = CONCAT('SELECT ''',@tblname,''',''',@Colname,''',''',@ColType,''',',@Colname,',COUNT_BIG(',@Colname,'),',@MissingData,',''',@MinVal,''',''',@MaxVal,''',',@SpecialCharacters,',',@LeadingTrailingSpaces,',',@MinLen,',',@MaxLen,',','''','This field has Unique values = ',@Cnt,'''',' FROM ',@tblname,' GROUP BY ',@Colname)
      INSERT#Final (TableName,ColumnName,ColumnType,ColumnUniqueValues,UniqueValueOccurance,MissingDataRowCount,MinValue,MaxValue,SpecialCharacters,LeadingTrailingSpaces,MinFieldValueLen,MaxFieldValueLen,Comment)
      EXEC(@SQL)
      END
  ELSE
      BEGIN
      INSERT#Final (TableName,ColumnName,ColumnType,MissingDataRowCount,MinValue,MaxValue,SpecialCharacters,LeadingTrailingSpaces,MinFieldValueLen,MaxFieldValueLen,Comment)
      SELECT@tblname,@Colname,@ColType,@MissingData,@MinVal,@MaxVal,@SpecialCharacters,@LeadingTrailingSpaces,@MinLen,@MaxLen,CONCAT('This field has Unique values = ',@Cnt)
      END
  SET@I = @I + 1
  END
  IF @Report = 1
  BEGIN
  SELECTDISTINCT   TableName,ColumnName,ColumnType,MissingDataRowCount,MinValue,MaxValue,SpecialCharacters
                    ,LeadingTrailingSpaces,MinFieldValueLen,MaxFieldValueLen,Comment
  FROM#Final
  ORDERBYTableName,ColumnName
  END
   
 IF @Report = 2
 BEGIN
  SELECTTableName,ColumnName,ColumnUniqueValues,UniqueValueOccurance,Comment
  FROM#Final
  ORDERBYTableName,ColumnName
 END
  END


2) usp_DataProfiling_Metadata


CREATEORALTERPROCEDUREusp_DataProfiling_Metadata
 @Report TINYINT ,  --1 = 'TableStats', 2 = 'TableColumnMetadata'
 @SchemaName NVARCHAR(MAX) = N'',
 @ObjectlisttoSearch NVARCHAR(MAX) = N''
AS
 BEGIN
 SETNOCOUNT ON;
DROPTABLEIF EXISTS  #TblList
CREATETABLE#TblList(Id INTIDENTITY(1,1),TableName NVARCHAR(200) )
DROPTABLEIF EXISTS  #Tblstats
CREATETABLE#Tblstats (TableName NVARCHAR(200),NoOfRows NVARCHAR(100),ReservedSpace NVARCHAR(100)
                       ,DataSpace NVARCHAR(100),IndexSize NVARCHAR(100),UnusedSpace NVARCHAR(100)
                       ,LastUserUpdate DATETIME)
 IF ISNULL(@SchemaName,'') <> ''  ORISNULL(@ObjectlisttoSearch,'') <> ''
 BEGIN
INSERT#TblList (TableName)
SELECTCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) TableName 
FROMSys.tables
WHERE(Schema_name(schema_id) IN(SELECTvalue FROMSTRING_SPLIT(@SchemaName, ','))
    ORCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) IN(SELECTvalue FROMSTRING_SPLIT(@ObjectlisttoSearch, ',')))
 ENDELSE
 BEGIN
INSERT#TblList (TableName)
SELECTCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) TableName 
FROMSys.tables
 END
DECLARE@Tblstats TABLE(TableName NVARCHAR(200),NoOfRows NVARCHAR(100),ReservedSpace NVARCHAR(100)
                       ,DataSpace NVARCHAR(100),IndexSize NVARCHAR(100),UnusedSpace NVARCHAR(100)
                       )
DECLARE@I                        INT= 1
       ,@tblname                  NVARCHAR(128) = N''
       ,@last_user_update         DATETIME
WHILE @I <= (SELECTCOUNT(1) FROM#TblList)
BEGIN
SELECT@tblname=TableName FROM#TblList WHEREId = @I
INSERT@Tblstats
EXECsp_spaceused @tblname; 
SELECTTOP1 @last_user_update=last_user_update
FROMsys.dm_db_index_usage_stats  
WHEREobject_id = OBJECT_ID(@tblname)
ORDERBY   last_user_update DESC
INSERT#Tblstats(TableName,NoOfRows,ReservedSpace,DataSpace,IndexSize,UnusedSpace,LastUserUpdate)
SELECT@tblname,NoOfRows,ReservedSpace,DataSpace,IndexSize,UnusedSpace,@last_user_update 
FROM@Tblstats
DELETEFROM@Tblstats
SET@I = @I + 1
END
  IF @Report = 1
  BEGIN
 ;WITHSystbl
 AS
 (
  SELECTDISTINCTCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) TableName
        ,modify_date TableSchema_LastModifyDate
        ,CASEWHENis_replicated = 1 THEN'Yes'ELSE'No'ENDASIsReplicated
        ,CASEWHENis_filetable = 1 THEN'Yes'ELSE'No'ENDASIsFileTable
        ,CASEWHENis_memory_optimized = 1 THEN'Yes'ELSE'No'ENDASIsMemoryOptimized
        ,temporal_type_desc TemporalTypeDesc
        ,CASEWHENis_remote_data_archive_enabled = 1 THEN'Yes'ELSE'No'ENDASIsStretchEnabled
        ,CASEWHENis_external = 1 THEN'Yes'ELSE'No'ENDASIsExternal
        ,CASEWHENis_node = 1 ORis_edge = 1 THEN'Yes'ELSE'No'ENDIsGraphTable
 FROMsys.tables ST
 JOIN#TblList T
 ONCONCAT(SCHEMA_NAME(SCHEMA_ID),'.',name) COLLATEDATABASE_DEFAULT = T.TableName COLLATEDATABASE_DEFAULT
 )
SELECTB.*,A.TableSchema_LastModifyDate
,A.IsMemoryOptimized
,A.IsExternal
,A.IsStretchEnabled
,A.IsFileTable
,A.IsGraphTable
,A.IsReplicated
,A.TemporalTypeDesc
FROMSystbl A
JOIN#Tblstats B
ONA.TableName COLLATEDATABASE_DEFAULT = B.TableName COLLATEDATABASE_DEFAULT
  END
   
 IF @Report = 2
 BEGIN
  SELECT  DISTINCTCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
         ,C.nameColumnName
         ,CASEWHENTY.is_user_defined = 1 THEN(SELECTnameFROMsys.types
                                                 WHEREsystem_type_id = user_type_id
                                                    AND  system_type_id =  TY.system_type_id)
                                            ELSETY.name
          ENDASDataType
          ,C.max_length
          ,C.precision
          ,C.scale
          ,C.collation_name
          ,CASEWHENC.is_nullable = 1 THEN'Yes'ELSE'No'ENDASIsNullable
          ,CASEWHENC.is_identity = 1 THEN'Yes'ELSE'No'ENDASIsIdentity
          ,CASEWHENC.is_masked = 1 THEN'Yes'ELSE'No'ENDASIsMasked
          ,CASEWHENC.is_hidden = 1 THEN'Yes'ELSE'No'ENDASIsHidden
          ,CASEWHENC.is_computed = 1 THEN'Yes'ELSE'No'ENDASIsComputed
          ,CASEWHENC.is_filestream = 1 THEN'Yes'ELSE'No'ENDASIsFileStream
          ,CASEWHENC.is_sparse = 1 THEN'Yes'ELSE'No'ENDASIsSparse
          ,C.encryption_type_desc  EncryptionTypeDesc
FROMSys.tables T
JOINsys.columns C
    ONT.object_id = C.object_id
JOINsys.types TY
    ONC.[user_type_id] = TY.[user_type_id]
WHERE(Schema_name(T.schema_id) IN(SELECTvalue FROMSTRING_SPLIT(@SchemaName, ','))
    ORCONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN(SELECTvalue FROMSTRING_SPLIT(@ObjectlisttoSearch, ',')))
 END
  END





Please see my technet wiki article for detailed explanation on this topic with examples and screenshots and you can also download the scripts and SSRS report from technet gallery site.


SQL Server Import/Export to Excel using Python script

$
0
0
Introduction

In previous article, we saw how to import/export excel to/from SQL Server by executing R script within T-SQL
In this post, let us see another similar approach to import excel into SQL Server and export SQL server data to excel by executing Python script within T-SQL.

There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.

BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly   
BCP - Workaround has to be done to include the header while exporting
SSIS - Though it supports exporting to excel, with dynamic source & destination, handling mapping between source to target increases the complexity of the package
SQLCMD - Cannot export output in Excel file format


R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database   and Java language extension support in SQL Server 2019 , this new approach can be used extensively as it easy, fast and flexible.




Pre-requisites

We have used SQL Server 2019 evaluation edition on Windows 10 64 bit and WideWorldImporters SQL Server sample database for this example.

Below scripts will work starting from SQL Server 2017 and above (as execution of Python language using T-SQL was introduced in SQL Server 2017).


Provide folder permission to access the excel files during import / export process, right-click on folder -> Properties -> Security -> (Full Control) to "ALL_APPLICATION_PACKAGES".



pandas.DataFrame.to_excel & pandas.read_excel are used to export and import excel which are installed by default


We can check that by running below statement in SSMS:

EXECUTEsp_execute_external_script
@language =N'Python',
@script=N'import pip
for i in pip.get_installed_distributions():
    print(i)';

GO


Exporting SQL Server data (list of tables, views) into Excel file



For this approach, we have created a Stored procedure named "usp_ExportExcel" in "WideWorldImporters" database. Executing the Stored procedure based on input parameters exports SQL Server data to excel files

Stored procedure  -  Has below three input parameters and writes the output to excel file


ParameterDescription
@ExportPath Path for exporting excel files 
@SchemaNameList of objects under this schema to be exported. Can have multiple values separated by comma 
@ObjectlisttoExportList of tables, views to be exported. Can have multiple values separated by comma


CREATEORALTERPROC usp_ExportExcel (@ExportPath NVARCHAR(MAX),
                                      @SchemaName NVARCHAR(MAX),
                                      @ObjectlisttoExport NVARCHAR(MAX)
                                      )
AS
BEGIN
SETNOCOUNT ON;
BEGINTRY
  
IF ISNULL(@ExportPath,'') <> ''
 BEGIN
 SELECT@ExportPath = CASEWHENRIGHT(@ExportPath,1) = '\' THEN @ExportPath ELSE CONCAT(@ExportPath,'\') END
 DECLARE @ValidPath TABLE (ValidPathCheck BIT)
INSERT @ValidPath
EXEC sp_execute_external_script
@language =N'Python',
@script=N'
import pandas aspd
d = os.path.isdir(ExportFilePath)
OutputDataSet = pd.DataFrame(d,columns=["Filename"])'
,@params = N'@ExportFilePath NVARCHAR(MAX)'
,@ExportFilePath = @ExportPath
        IF (SELECT ValidPathCheck FROM @ValidPath) = 1
BEGIN
  
 IF ISNULL(@SchemaName,'') <> ''  OR ISNULL(@ObjectlisttoExport,'') <> ''
 BEGIN
    
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList
  
CREATE TABLE #ExportTablesList(TableCount INT IDENTITY(1,1),Cols NVARCHAR(MAX),TableName NVARCHAR(200))
--Get the list of objects to be exported
INSERT #ExportTablesList (Cols,TableName)
SELECT  CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp','decimal','bit','int','bigint')
               THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS',QUOTENAME(C.NAME))
          ELSE C.name END Cols  -- To cover poor data type conversions b/n Python & SQL Server
  ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) IN (SELECT value FROM STRING_SPLIT(@SchemaName, ','))
-- Ignore the datatypes that are not required to be exported
AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant') 
  
INSERT #ExportTablesList (Cols,TableName)
SELECT  CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp','decimal','bit','int','bigint')
               THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS',QUOTENAME(C.NAME))
          ELSE C.name END Cols  -- To cover poor data type conversions b/n Python & SQL Server
  ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, ','))
-- Ignore the datatypes that are not required to be exported
AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant') 
  
--Dedup of object list
;WITH dedup
AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY TableName,Cols ORDER BY Cols) Rn FROM #ExportTablesList
)
DELETE FROM dedup
WHERE Rn > 1
--Forming columns list as comma separated
SELECT TableName,IDENTITY(INT,1,1) AS TableCount
    , STUFF(
        (
        SELECT ', ' + C.Cols
        From #ExportTablesList As C
        WHERE C.TableName = T.TableName
        FOR XML PATH('')
        ), 1, 2, '') AS Cols
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName
  
  
DECLARE @I INT = 1
       ,@TableName NVARCHAR(200)
       ,@SQL NVARCHAR(MAX) = N''
       ,@PythonScript NVARCHAR(MAX) = N''
       ,@ExportFilePath NVARCHAR(MAX) = N''
--Loop through the object list to export as excel   
WHILE @I <= (SELECT COUNT(TableName) FROM #FinalExportList)
BEGIN
   
 -- Just for testing purpose top 10 records are selected
SELECT @SQL = CONCAT('SELECTTOP10 ',Cols,'FROM',TableName,';')
      ,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I
  
   
SET @PythonScript = N'
FullFilePath = ExcelFilePath+TableName+".xlsx"
InputDataSet.to_excel(FullFilePath,sheet_name=TableName.split(".")[-1],index=False)'
   
EXEC   sp_execute_external_script
      @language = N'Python'
     ,@script = @PythonScript
     ,@input_data_1 = @SQL
     ,@params = N'@ExcelFilePath NVARCHAR(MAX), @TableName NVARCHAR(200)'
     ,@ExcelFilePath = @ExportPath -- file path where Excel files are placed
     ,@TableName = @TableName
  
   
SET @I = @I + 1
END
  
 END ELSE PRINT 'Schemanameofobjects orlist ofobjects (separated bycomma) tobe exported need tobe mentioned'
 END ELSE PRINT 'Invalid folder path'
 END ELSE PRINT 'Export folder path need tobe mentioned'
 END TRY
BEGIN CATCH
PRINT 'Issue while executing this SP, please checkwhether there ispermission toexecutethe script / toaccess the folder andinput params are valid'
ENDCATCH
  
END




Importing Excel file into SQL Server


For this approach, we have created a Stored procedure named "usp_ImportExcel" in "WideWorldImporters"database. Executing the Stored procedure based on input parameters imports excel files to SQL Server table


Stored procedure - Has below six input parameters 


ParameterDescription
 @ImportPathPath where excel files are placed for importing into SQL Server 
 @DBConnectionStringTarget SQL Server database connection string where files are imported. Can be
Database=DB name; Trusted_Connection=True
or
Database=DB name;Uid= user name;Pwd=Password
 @ImportAllIf set to 1 then all files in the mentioned path are imported. If set to 0 then only mentioned files are imported
 @CombineTargetFlag to decide single target table for each source file (files with same structure) or separate target table for each source file 
 @ExcelFileNameIf @ImportAll =  0 then excel file name needs to be passed to this parameter
 @ExcelSheetNameIf @ImportAll =  0 then corresponding sheet name of the excel file needs to be passed to this parameter 

CREATEORALTERPROC usp_ImportExcel (@ImportPath NVARCHAR(MAX),
                                      @DBConnectionString NVARCHAR(MAX),
                                      @ImportAll BIT,
                      @CombineTarget BIT,
                                      @ExcelFileName NVARCHAR(200),
                                      @ExcelSheetName NVARCHAR(50)
                                      )
AS
BEGIN
SETNOCOUNT ON;
BEGINTRY
  
IF ISNULL(@ImportPath,'') <> ''ANDISNULL(@DBConnectionString,'') <> ''
 BEGIN
 SELECT@ImportPath = CASEWHENRIGHT(@ImportPath,1) = '\' THEN @ImportPath ELSE CONCAT(@ImportPath,'\') END
 DECLARE @Serv NVARCHAR(200) = CONCAT(CHAR(39),CHAR(39),@@SERVERNAME,CHAR(39),CHAR(39))
  DECLARE @ValidPath TABLE (ValidPathCheck BIT)
INSERT @ValidPath
EXEC sp_execute_external_script
@language =N'Python',
@script=N'
import pandas aspd
d = os.path.isdir(ImportFilePath)
OutputDataSet = pd.DataFrame(d,columns=["Filename"])'
,@params = N'@ImportFilePath NVARCHAR(MAX)'
,@ImportFilePath = @ImportPath
        IF (SELECT ValidPathCheck FROM @ValidPath) = 1
BEGIN
   
  IF (@ImportAll = 0 AND (ISNULL(@ExcelFileName,'') <> '' AND ISNULL(@ExcelSheetName,'') <> ''))
  OR (@ImportAll = 1 AND (ISNULL(@ExcelFileName,'') = '' AND ISNULL(@ExcelSheetName,'') = ''))
  
  BEGIN
        
     
DECLARE @PythonScript NVARCHAR(MAX) =CONCAT('
import pandas aspd
import os
import glob
fromrevoscalepy import RxSqlServerData, rx_data_step
sqlConnString = "Driver=SQL Server;Server=Serv; ',@DBConnectionString,'"
Filefolderepath = ImportFilePath+"*.xlsx"
if ImportAll ==1 andCombineTarget==0:
    forFullFilePath inglob.glob(Filefolderepath):
       Filename = os.path.basename(FullFilePath).replace(".xlsx","")
       xl = pd.ExcelFile(FullFilePath)
       forsheetname inxl.sheet_names:
          Output= pd.read_excel(FullFilePath, sheetname=sheetname, na_filter=False).astype(str)
       if notOutput.empty:
          sqlDS = RxSqlServerData(connection_string = sqlConnString,table= "".join(fl forfl inFilename if fl.isalnum())+"_"+"".join(sh forsh insheetname if sh.isalnum()))
          rx_data_step(input_data = Output, output_file = sqlDS,overwrite = True)
if ImportAll ==1 andCombineTarget==1:
   df2=pd.DataFrame()
   forFullFilePath inglob.glob(Filefolderepath):
       Filename = os.path.basename(FullFilePath).replace(".xlsx","")
       xl = pd.ExcelFile(FullFilePath)
       forsheetname inxl.sheet_names:
           Output= pd.read_excel(FullFilePath, sheetname=sheetname).columns.astype(str)
           Output= ",".join(list(Output))
           df1 = pd.DataFrame([[Filename,sheetname,FullFilePath,Output]],columns=["Filename","sheetname","FullFilePath","Headers"])
           df2=df2.append(df1,ignore_index=True)
   sqlDS = RxSqlServerData(connection_string = sqlConnString,table= "Tbl_PyImpExp1")
   rx_data_step(input_data = df2, output_file = sqlDS,overwrite = True)
if ImportAll ==0:
   Filename =ImportFilePath+ExcelFileName+".xlsx"
   exists = os.path.isfile(Filename)
   if exists andExcelSheetName inpd.ExcelFile(Filename).sheet_names:
         Output= pd.read_excel(Filename, sheetname=ExcelSheetName, na_filter=False).astype(str)
         if notOutput.empty:
             sqlDS = RxSqlServerData(connection_string = sqlConnString,table= "".join(fl forfl inExcelFileName if fl.isalnum())+"_"+"".join(sh forsh inExcelSheetName if sh.isalnum()))
             rx_data_step(input_data = Output, output_file = sqlDS,overwrite = True)
   else:
      print("Invalid Excel file or sheet name")')
  
EXEC   sp_execute_external_script
      @language = N'Python'
     ,@script = @PythonScript
     ,@params = N'@ImportFilePath NVARCHAR(MAX),@ImportAll BIT,@CombineTarget BIT,@ExcelFileName NVARCHAR(200),@ExcelSheetName NVARCHAR(50),@Serv NVARCHAR(200)'
     ,@ImportFilePath = @ImportPath
     ,@ImportAll = @ImportAll
     ,@CombineTarget = @CombineTarget
     ,@ExcelFileName = @ExcelFileName
     ,@ExcelSheetName = @ExcelSheetName
     ,@Serv = @Serv
     IF @ImportAll =1 AND @CombineTarget =1
     BEGIN
        IF OBJECT_ID('Tbl_PyImpExp1') IS NOT NULL
            BEGIN
                    DROP TABLE IF EXISTS Tbl_PyImpExp2
                    ;WITH FileList
                    As(
                    SELECT [Filename]
                          ,[sheetname]
                          ,[Headers]
                          ,[FullFilePath]
                          ,ROW_NUMBER()OVER(ORDER BY (SELECT 1)) Rn
                          ,ROW_NUMBER()OVER(PARTITION BY [Headers] ORDER BY [Headers]) Grp
                          ,DENSE_RANK()OVER(ORDER BY [Headers]) Grp1
                      FROM [dbo].[Tbl_PyImpExp1]
                      )
                      SELECT  *,FIRST_VALUE([Filename]) OVER (PARTITION BY Grp1 ORDER BY Grp ASC) AS TableName
                      INTO Tbl_PyImpExp2
                      FROM FileList
            END
                IF EXISTS (SELECT 1 FROM Tbl_PyImpExp2)
                BEGIN
                     DECLARE @I INT = 1
                            ,@SQL NVARCHAR(MAX) =N''
SET @PythonScript = CONCAT('
import pandas aspd
fromrevoscalepy import RxSqlServerData, rx_data_step
sqlConnString = "Driver=SQL Server;Server=Serv; ',@DBConnectionString,'"
if ImportAll ==1 andCombineTarget==1:
   FinalImport=pd.DataFrame()
   forindex, row inInputDataSet.iterrows():
       Tbl = "".join(T forT inrow["TableName"] if T.isalnum())
       Import = pd.read_excel(row["FullFilePath"], sheetname=row["sheetname"], na_filter=False).astype(str)
       Import["ImportKey"] = row["TableName"]+"_"+row["sheetname"]
       FinalImport=FinalImport.append(Import,ignore_index=True)
       if notFinalImport.empty:
          sqlDS = RxSqlServerData(connection_string = sqlConnString,table= Tbl)
          rx_data_step(input_data = FinalImport, output_file = sqlDS,overwrite = True)')
                     WHILE @I <= (SELECT MAX(Grp1) FROM Tbl_PyImpExp2)
                     BEGIN
                     SET @SQL =  CONCAT('SELECTFullFilePath,sheetname,TableName FROMTbl_PyImpExp2 WHEREGrp1 = ',@I)
                         EXEC   sp_execute_external_script
                          @language = N'Python'
                         ,@script = @PythonScript
                         ,@input_data_1 = @SQL
                         ,@params = N'@ImportAll BIT,@CombineTarget BIT,@Serv NVARCHAR(200)'
                         ,@Serv = @Serv
                         ,@ImportAll = @ImportAll
                         ,@CombineTarget = @CombineTarget
                     SET @I = @I + 1
                     END
                 END
     DROP TABLE IF EXISTS Tbl_PyImpExp1,Tbl_PyImpExp2
     END
      
     END ELSE PRINT 'Invalid parameters: If ImportAll = 0 thenpass Excel file & Sheet Nameasinput. If ImportAll = 1 thenpass Excel file & Sheet Nameblank'
   
  END ELSE PRINT 'Invalid folder path'
END ELSE PRINT 'Import folder path ordatabaseconnectionstring need tobe mentioned'
  
 END TRY
BEGIN CATCH
PRINT 'Issue while executing this SP, please checkwhether there ispermission toexecutethe script / toaccess the folder andinput params are valid'
ENDCATCH
END




Please see my technet wiki article for detailed explanation on this topic with examples and screenshots and you can also download the scripts and SSRS report from technet gallery site.

We need to be careful with indentations in python script as formatting them without understanding the code can result in error.



How to use R script in Power BI

$
0
0
In this post, I have embedded the video link to my you tube channel and in the video I have showed how to use R script in Power BI. I have also posted below the scripts shown in the video.

In the video, I have explained:

1)How to R script in Power BI
2)Example dataset - COVID-19 (2019-nCoV) github data repository by Johns Hopkins University. How to invoke this data dynamically using Rcurl.
3)How to configure to specific R version in R Studio & Power BI

4)Sample visualization in Power BI using R dataset






How to use R script in Power BI
--To enable external scripts execution
EXEC sp_configure  'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE

--To print R version
EXECUTE sp_execute_external_script @language = N'R'
    , @script = N'print(version)';
GO

--To list all R packages installed

EXEC sp_execute_external_script @language = N'R'
    , @script = N'
OutputDataSet <- data.frame(installed.packages()[,c("Package", "Version", "Depends", "License", "LibPath")]);'
WITH result sets((
            Package NVARCHAR(255)
            , Version NVARCHAR(100)
            , Depends NVARCHAR(4000)
            , License NVARCHAR(1000)
            , LibPath NVARCHAR(2000)

            ));

R Script used in power BI:
library(RCurl)

dt <- format(as.Date(Sys.time())-1, "%m-%d-%Y")
uri1 <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"

uri <- paste0(uri1,dt,".csv")

rawdata <- getURL(uri)

convidds <- read.csv(text = rawdata)


See Also:

My other posts on Power BI

SQL Server Import/Export excel using R






Viewing all 175 articles
Browse latest View live