1

I've been trying to track down an odd encoding issue with artifacts coming out of GitLab.

One XML file was going in as UTF8 and coming out as UCS-2 LE BOM after a stack of testing I'm genuinely shocked to discover it's PowerShell doing the damage.

The powershell script is even running on a Windows box!! I have this code in a script:

function Update-SourceDataFileVersion
{
  Param ([string]$Version)

  foreach ($o in $input) 
  {
    Write-output $o.FullName 
    $TmpFile = $o.FullName + ".tmp" 

     get-content $o.FullName | 
        %{$_ -replace 'x.x.x.x', $Version } > $TmpFile

     move-item $TmpFile $o.FullName -force
  }
}

And I know I need to specify an encoding. From looking at other answers on SO I should be able to do this but I just cannot find the right syntax.

I've tried:

function Update-SourceDataFileVersion
{
  Param ([string]$Version)

  foreach ($o in $input) 
  {
    Write-output -Encoding utf8 $o.FullName 
    $TmpFile = $o.FullName + ".tmp" 

     get-content -Encoding utf8 $o.FullName | 
        %{$_ -replace 'x.x.x.x', $Version } > $TmpFile -Encoding utf8

     move-item $TmpFile $o.FullName -force
  }
}

As per the other examples but that just results in empty files.

How can I stop powershell from breaking my files and setting the right encoding? I'm running PS 5.1

1
  • Windows PowerShell is, unfortunately, wildly inconsistent with respect to default character encodings, unlike PowerShell (Core) 7+, which now consistently defaults to BOM-less UTF-8. Note that while executing $PSDefaultParameterValues['*:Encoding'] = 'utf8' first can make Windows PowerShell v5.1's > operator produce UTF-8 files, they will invariably have a BOM - see this answer. Commented Feb 5, 2021 at 21:02

1 Answer 1

2

In your example you are using redirection > to save the output to a file. > it's an operator and doesn't support options. Thus setting the encoding doesn't make any difference.

Instead you want to use the Out-File cmdlet

function Update-SourceDataFileVersion
{
  Param ([string]$Version)

  foreach ($o in $input) 
  {
    $TmpFile = $o.FullName + ".tmp" 

     get-content -Encoding utf8 $o.FullName | `
        %{$_ -replace 'x.x.x.x', $Version } | `
        Out-File -FilePath $TmpFile -Encoding utf8

     move-item $TmpFile $o.FullName -force
  }
}

BTW: I think that you use Write-Output in the wrong way: it is used to pass an object along a pipeline, not to write to a file. If you what to log the file name you should use Write-Host instead

Sign up to request clarification or add additional context in comments.

9 Comments

Good grief. I don't touch PS much, so many gotchas. I found this works too $OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8 $PSDefaultParameterValues['*:Encoding'] = 'utf8'
It's really not obvious when you read ">" is a shortcut to xxxx. But it doesn't then tell you that key caveat that you can't then use options and it just leads to confusion. Thanks for your help!
You're welcome. Regarding the shortcuts: I'm making an habit of non using them. It gets a bit more worthy, but it saves from some issues
Windows PowerShell is, unfortunately, wildly inconsistent with respect to default character encodings, unlike PowerShell (Core) 7+, which now consistently defaults to BOM-less UTF-8 - see this answer.
As an aside: If the input objects are strings, it is faster to use Set-Content instead of Out-File / >, which may matter when writing large files - see this answer. Note: Windows PowerShell's Set-Content defaults to the active ANSI code page.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.