Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DbaCsv needs support for GUID target columns #9433

Open
petervandivier opened this issue Jul 25, 2024 · 11 comments
Open

DbaCsv needs support for GUID target columns #9433

petervandivier opened this issue Jul 25, 2024 · 11 comments
Assignees
Labels
3rd party Issue relates to a 3rd party library used by the module bugs life

Comments

@petervandivier
Copy link
Contributor

petervandivier commented Jul 25, 2024

Verified issue does not already exist?

I have searched and found no existing issue

What error did you receive?

WARNING: [13:39:27][Import-DbaCsv] Failure | Invalid cast from 'System.String' to 'System.Guid'.


No obvious quick fix jumps out at me reading through the SqlBulkCopy docs at this time.

Possibly related: #8409

Steps to Reproduce

It appears that the SqlBulkCopy object as currently defined in Invoke-DbaCsv does not support inserting valid guid strings into a uniqueidentifier target column.

The below repro errors out:

Invoke-DbaQuery `
    -SqlInstance localhost `
    -Database tempdb `
    -Query "create table foo ([Guid] uniqueidentifier);"

New-Guid | Export-Csv ./foo.csv 

Import-DbaCsv `
    -SqlInstance localhost `
    -Database tempdb `
    -Table foo `
    -Path ./foo.csv `
    -SingleColumn

I tried to step through the source code to see if there was something obvious I was missing about this functionality. Extracting just the relevant code for a minimal repro appears to indicate that this may be an underlying issue with SqlBulkCopy

SqlBulkCopy repro initialisation
$server = Connect-DbaInstance -SqlInstance localhost -Database tempdb
$sqlconn = $server.ConnectionContext.SqlConnectionObject
if ($sqlconn.State -ne 'Open') {$sqlconn.Open()}

# Import-DbaCsv.ps1#L540
# [int]$bulkCopyOptions = ([Microsoft.Data.SqlClient.SqlBulkCopyOptions]::Default)

# $bulkcopy = New-Object Microsoft.Data.SqlClient.SqlBulkCopy(
#     $sqlconn, 
#     $bulkCopyOptions, 
#     $transaction # $null
# )
# # New-Object: Cannot find an overload for "SqlBulkCopy" and the argument count: 3

$bulkcopy = New-Object Microsoft.Data.SqlClient.SqlBulkCopy($sqlconn)

$bulkcopy.DestinationTableName = 'foo'
$bulkcopy.BulkCopyTimeout = 0
$bulkCopy.BatchSize = 100
$bulkCopy.NotifyAfter = 30000
$bulkCopy.EnableStreaming = $true

$stream = [System.IO.File]::OpenRead("foo.csv")

$textReader = New-Object System.IO.StreamReader(
    $stream, 
    [System.Text.Encoding]::"UTF8"
)

$FirstRowHeader = $true
$Delimiter = ","
$Quote = '"'
$Escape = '"'
$Comment = '#'
$TrimmingOption = "None"
$BufferSize = 4096

$reader = New-Object LumenWorks.Framework.IO.Csv.CsvReader(
    $textReader,
    $FirstRowHeader,
    $Delimiter,
    $Quote,
    $Escape,
    $Comment,
    [LumenWorks.Framework.IO.Csv.ValueTrimmingOptions]::$TrimmingOption,
    $BufferSize,
    $NullValue # $null
)

After running the above initialisation, executing the below errors out.

$bulkCopy.WriteToServer($reader)

MethodInvocationException: Exception calling "WriteToServer" with "1" argument(s): "The given value '8a8b0b97-bda2-4d1a-a978-5a8c1caa5cb2' of type String from the data source cannot be converted to type uniqueidentifier for Column 0 [Guid]."

Please confirm that you are running the most recent version of dbatools

v2.1.15 - May 18, 2024

Other details or mentions

Slack channel chat context: https://sqlcommunity.slack.com/archives/C1M2WEASG/p1721846945371649

What PowerShell host was used when producing this error

VS Code (integrated terminal)

PowerShell Host Version

Name                           Value
----                           -----
PSVersion                      7.4.3
PSEdition                      Core
GitCommitId                    7.4.3
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

SQL Server Edition and Build number

Microsoft SQL Server 2022 (RTM-GDR) (KB5040936) - 16.0.1121.4 (X64) 
	Jul  2 2024 00:22:34 
	Copyright (C) 2022 Microsoft Corporation
	Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 22631: ) (Hypervisor)

.NET Framework Version

.NET 8.0.6

@petervandivier petervandivier added bugs life triage required New issue that has not been reviewed by maintainers labels Jul 25, 2024
@andreasjordan
Copy link
Contributor

If you think it's an issue with SqlBulkCopy - why don't you open an issue against SqlBulkCopy? As you have shown, there is nothing dbatools can do here.

@petervandivier
Copy link
Contributor Author

I don't believe that I've demonstrated there is nothing to be done. A cursory glance at the SqlBulkCopy source code seems to imply that GUID support exists (it would be a little shocking if it didn't).

While I am unable at this time to parse out where the root cause exists, I'll be happy to open up an issue against either the SqlBulkCopy source or docs pages as and when the time seems ripe to do so.

Even assuming SqlBulkCopy must be modified at source to support GUID target types, I imagine the lack of support demonstrated in this issue might warrant the addition of a unit test on resolution.

@andreasjordan
Copy link
Contributor

My last comment was maybe too short.

I don't see a quick fix. And I don't have time to do deeper investigations. So if you have the time and a solution, I would be happy to include it in the dbatools source code.

But maybe in the future, some other contributor can take the time and come up with a solution.

@niphlod
Copy link
Contributor

niphlod commented Jul 29, 2024

if bulkcopy didn't support GUIDs there would be zillions of users ranting.
I think the culprit here is that we use an external lib (Lumen) to read the csv, and that doesn't output a guid column, rather a varchar one.

@petervandivier
Copy link
Contributor Author

petervandivier commented Jul 29, 2024

Idk, there appears to exist a GUID unit test in the LumenWorks source 🤔

namespace LumenWorks.Framework.Tests.Unit.IO.Csv
{
    [TestFixture]
    public class ColumnTests
    {
        [Test]
        public void ConvertGuid()
        {
            var expected = Guid.NewGuid();
            var column = new Column { Name = "A", Type = typeof(Guid) };
            var candidate = column.Convert(expected.ToString());
            Assert.IsInstanceOf<Guid>(candidate);
            Assert.That(expected == (Guid)candidate);
        }

@niphlod
Copy link
Contributor

niphlod commented Jul 31, 2024

and what tells lumen to use a guid column when reading ? it doesn't do heuristic AFAIK, so we know that lumen supports guid, but IMHO we're not telling via Import-DbaCsv (again, AFAIK, we support playing with column mapping via NAME, nothing via TYPE) that a guid column is needed when reading the csv.

@niphlod
Copy link
Contributor

niphlod commented Jul 31, 2024

BTW, definitely related to #8409 as noted in the issue, it's the same exact deal (in that case, we're not telling Lumen to output a bit, in this we're not telling Lumen to output a guid)

@niphlod
Copy link
Contributor

niphlod commented Jul 31, 2024

so, ideally, fixable. I'll work on this and report back if there are further limitations

@niphlod niphlod added 3rd party Issue relates to a 3rd party library used by the module and removed triage required New issue that has not been reviewed by maintainers labels Jul 31, 2024
@niphlod niphlod self-assigned this Jul 31, 2024
@petervandivier
Copy link
Contributor Author

petervandivier commented Jul 31, 2024

we know that lumen supports guid, but IMHO we're not telling via Import-DbaCsv

This condition pops if-and-only-if we're inserting to a pre-existing table. At first thought a schema check would allow mapping of this sort.

Since it's not been reported before it seems like a potentially niche condition (based on only 2 reports in 2 years), perhaps this case is low-occurrence enough that it's prudent to save the schema check for a catch{} block if one of the known errors pops so we avoid the overhead for runs where it doesn't matter

@niphlod
Copy link
Contributor

niphlod commented Jul 31, 2024

when the table gets autocreated all columns are nvarchar(max) so it's easy :D

@niphlod
Copy link
Contributor

niphlod commented Aug 8, 2024

I did not forget about this, something that seemed like a 50LOC update is turning into a 500LOC one. I need to play a bit with lumen's API (fortunately there are examples to follow) directly in c# to figure out how to make it work on PS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Issue relates to a 3rd party library used by the module bugs life
Projects
None yet
Development

No branches or pull requests

3 participants