Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Saturday, February 16, 2013 12:25 AM
I am attempting to read a number of ASCII encoded files, and concat them into one ASCII encoded file, but characters in the extended ASCII family are showing up as question marks (?) in the concat'd file and I am uncertain why.
I can get the actual characters to show up correctly when I use "cat file1.txt, file2,.txt >> output.txt" but this method is quite slow with large files. I have found that if I get down to the .NET framework level, it is much faster, so I am attempting to use StreamReader and StreamWriter, but I cannot get them to display the ASCII extended characters correctly.
My input files look like this (but with different numbers for the different files):
RUN_SQLýdelete from table_0 where Deal_Id = '1111'
RUN_SQLýdelete from table_1 where Deal_Id = '1111'
RUN_SQLýdelete from table_2 where Deal_Id = '1111'
My script looks like this:
$dRAW="C:\kr\scripts\file_cat"
$InputFile = New-Object System.Collections.ArrayList
foreach ($file in (Get-ChildItem $dRAW\.txt))
{
$InputFile.Add("$file")
}
$OutputFile = "$dRAW\download.mnt"
$StreamWriter = New-Object System.IO.StreamWriter("$OutputFile", "True", [System.Text.Encoding]::Ascii)
FOREACH ($Individualfile in $InputFile)
{
$StreamReader = New-Object System.IO.StreamReader("$IndividualFile", [System.Text.Encoding]::Ascii, "False")
$Line = $StreamReader.ReadToEnd()
$StreamWriter.Write("$Line")
$StreamWriter.Flush()
}
$StreamReader.Close()
$StreamReader.Dispose()
$StreamWriter.Close()
$StreamWriter.Dispose()
The output looks like this:
RUN_SQL?delete from table_0 where Deal_Id = '1111'
RUN_SQL?delete from table_1 where Deal_Id = '1111'
RUN_SQL?delete from table_2 where Deal_Id = '1111'
My systems $OutputEncoding variable is set to US-ASCII, and I would figure that it doesn't matter since I'm specifying my input and output streams as ASCII...
Thank you, Kevin Roman
All replies (3)
Saturday, February 16, 2013 5:47 PM ✅Answered | 1 vote
http://www.yoda.arachsys.com/csharp/unicode.html
ASCII is one of the most commonly known and frequently misunderstood character encodings. Contrary to popular belief, it is only 7 bit - there are no ASCII characters above 127. If anyone says that they wish to encode (for example) "ASCII 154" they may well not know exactly which encoding they actually mean. If pressed, they're likely to say it's "extended ASCII". There is no encoding scheme called "extended ASCII". There are many 8-bit encodings which are supersets of ASCII, and usually it is one of these which is meant - commonly whatever Windows Code Page is the default for their computer. Every ASCII character has the same value in the ASCII encoded as in the Unicode coded character set - in other words, ASCII x is the same character as Unicode x for all characters within ASCII. The .NET ASCIIEncoding
class (an instance of which can be easily retrieved using the Encoding.ASCII
property) is slightly odd, in my view, as it appears to encode by merely stripping away all bits above the bottom 7. This means that, for instance, Unicode character 0xb5 ("micro sign") after encoding and decoding would become Unicode 0x35 ("digit five"), rather than some character showing that it was the result of encoding a character not contained within ASCII.
cat file1.txt, file2.txt >> outputfile.txt - Encoding of the outputfile.txt is not the ASCII
If you want using StreamReader,StreamWriter you must change encoding to default,utf8.
Saturday, February 16, 2013 8:55 AM
For this task the best choise using PowerShell ISE(full supports unicode). Or try change encoding to Default,UTF8,Unicode instead of ASCII to correct display char.
Saturday, February 16, 2013 4:06 PM
Kazun,
The file that I am starting out with is encoded in ASCII, and in the end (though right now I just trying it from a powershell prompt), I will be running this from a script that begins in a DOS batch script (so I cannot call the Powershell ISE), so it is being called like so:
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -command "c:\kr\scripts\file_cat\contcat.ps1"
Either way, that does not really explain why this works when I use "cat file1.txt, file2.txt >> outputfile.txt" and not when I use my above logic... I would like to understand what I am doing wrong so I can fix it in past and future scripts.
Also, I don't understand why changing the encoding to something other than ASCII would matter, when this is an ASCII character (alt+236)
http://www.theasciicode.com.ar/
Thank you,
Kevin Roman
Thank you, Kevin Roman