Share via


How to Split text file into multiple text files?

Question

Friday, February 23, 2007 3:55 AM

Hi experts

I'm looking for help to split a large text file into multiple text files based on size of the File.

Can anybody help me in this?

 

 

All replies (12)

Friday, February 23, 2007 5:16 AM | 1 vote

Some more information about your requirements would help. What exactly do you mean when you say "based on size of the File"? If you wanted each file to be a maximum of a certain size you could do something like:

string sourceFileName = @"C:\VS2005 SP1.exe";

string destFileLocation = @"C:\;

int index = 0;

long maxFileSize = 52428800;

byte[] buffer = new byte[65536];

 

using (Stream source = File.OpenRead(sourceFileName))

{

    while (source.Position < source.Length)

    {

        index++;

 

        // Create a new sub File, and read into t

        string newFileName = Path.Combine(destFileLocation, Path.GetFileNameWithoutExtension(sourceFileName));

        newFileName += index.ToString() + Path.GetExtension(sourceFileName);

        using (Stream destination = File.OpenWrite(newFileName))

        {

            while (destination.Position < maxFileSize)

            {

                // Work out how many bytes to read

                int bytes = source.Read(buffer, 0, (int) Math.Min(maxFileSize, buffer.Length));

                destination.Write(buffer, 0, bytes);

 

                // Are we at the end of the file?

                if (bytes < Math.Min(maxFileSize, buffer.Length))

                {

                    break;

                }

            }

        }

    }

}


Friday, February 23, 2007 5:21 AM

Thanks for the reply  Sean Hederman

Suppose if I've text(.txt) file more than 500KB , I want to split it into muliple files.

Ex :

temp.txt is file with 1209KB size

Now the result should be

temp1.txt

temp2.txt

temp3.txt


Friday, February 23, 2007 5:37 AM

But what size should each file be? Whatever that size is, set the maxFileSize to that, and run my code, and it will automatically split the file for you into the directory specified in destFileLocation.


Friday, February 23, 2007 8:14 AM

Thank u , I'll try that.


Friday, February 23, 2007 8:55 AM

 

can u tell the maxSize field here, what it is,

I want to split at 500KB each file.

If the main file exceeds 500KB then I want to split it.


Friday, February 23, 2007 10:19 AM

Well, 500KB is 500x1024 bytes which means the maximum file size should be 512000.


Friday, February 23, 2007 10:56 AM

 

Thankq  Got

But I'm reading lines from the text file

This method , splitting the line and place the truncated one in other file

 

 


Friday, February 23, 2007 11:12 AM

It would have been useful to know that you needed intact lines upfront.

Well, basically you'd rewrite it to use StreamReader and ReadLine, then for each line you'd have to decide if the line would take your file beyond it's max size, and if it didn't write it to the file, and if it did, start a new file.


Tuesday, July 12, 2011 4:02 AM

i already split it out..but it takes time..my file size is more than 2-3G..even to split also need time, not even read and insert into db yet.. any suggestion to read csv file and store to mysql database in efficient/fastest way?? pls help..TQ


Tuesday, July 12, 2011 8:44 AM | 1 vote

Hi,

Steps the achieve the goal :

  1. Find the size of temp.txt and divide it by 500. This will help you decide you got to split in how many files. Like 1209/500 = 2.41, so you will need 3 files.
  2. Create a StringBuffer and start reading line by line using ReadLine of StreamReader.
  3. On reading each line calculate the size of StringBuffer in bytes. If it is < 500, thne continue reading and storing. If it turned > 500 remove the last line from StringBuffer.
  4. Copy the StringBuffer contents in a file# respectively.
  5. Continue reading lines till you have reached EOF and saving in file. Repeat 3 & 4 steps.

Step 3 can be done in another way also :

StringBuffer sb = new StringBuffer();

line = streamReader.ReadLine();

sb_bytes  = // Find the byte size of sb

line_bytes = // Find the byte size of line

if ( (sb_bytes + line_bytes) <= 500)

   sb.Append(line)

else

   // Write to File

 

 

Hope this helps. If you hae ny concern feel free to ask.

 

Thanks
If you find any answer helpful, then click "Vote As Helpful" and if it also solves your question then also click "Mark As Answer".


Tuesday, July 12, 2011 10:20 AM

Hi!

Can you please test this code:

 

Public Function SplitFile(ByVal Filename As String, ByVal RecordsToRead As Integer, ByVal Parts As Integer) As Boolean
  Dim filesname As String = Nothing
  Dim data() As String = IO.File.ReadAllLines(Filename)
  If (Parts * RecordsToRead <= data.Length) Then
   Dim portion(RecordsToRead - 1) As String
   For i As Integer = 0 To Parts - 1
    Array.ConstrainedCopy(data, RecordsToRead * i, portion, 0, RecordsToRead)
    Array.Clear(data, 0, RecordsToRead)
    IO.File.WriteAllText(Filename.Replace(".", i + 1 & "."), String.Join(vbCrLf, portion))
   Next
  Else
   Return False
  End If
  Return True
End Function

 

Here 'Filename' is the name of file, 'RecordsToRead'  is the number of records you want to read from a file and put it into a new file, 'Parts' in how many files you want to create. this can put some light on your issue to resolve.

 

regards,

Shahan

 


Friday, September 28, 2012 1:22 PM

Thank you so much with your code and some others I came up with the following solution! I have added a link at the bottom to some code I wrote that used some of the logic from this page. I figured I'd give honor where honor was due! Thanks!

Below is a explanation about what I needed:

Try This, I wrote this because I have some very large '|' delimited files that have \r\n inside of some of the columns and I needed to use \r\n as the end of the line delimiter. I was trying to import some files using SSIS packages but because of some corrupted data in the files I was unable to. The File was over 5 GB so it was too large to open and manually fix. I found the answer through looking through lots of Forums to understand how streams work and ended up coming up with a solution that reads each character in a file and spits out the line based on the definitions I added into it. this is for use in a Command Line Application, complete with help :). I hope this helps some other people out, I haven't found a solution quite like it anywhere else, although the ideas were inspired by this forum and others.

http://stackoverflow.com/a/12640862/1582188