Running multiple instances of IFW

User discussion and information resource forum for Image products.
TAC109
Posts: 273
Joined: Tue Sep 06, 2011 10:41 pm

Re: Running multiple instances of IFW

Post by TAC109 »

On Thu, 16 Aug 2012 13:24:25 PDT, TeraByte Support(TP) wrote:

>TAC109 wrote:
>> Thanks
>>
>> Is there any technical reason to stop multiple copies of IFL running
>> successfully in parallel? Same for IFD?
>
>There's an important clarification when running multiple instances of IFL:
>
>If using File (Direct) to create 2 (or more) images on the same target partition, the image files will be corrupted. If using File (Direct), each image should be created on a separate partition. If using File (OS), there's no problem creating more than one image at a time on a mounted share/partition.
>
Thanks for the clarification.

I don't plan to run backups in parallel at present. It was just a
theoretical possibility that occurred to me.
TAC109
Posts: 273
Joined: Tue Sep 06, 2011 10:41 pm

Re: Running multiple instances of IFW

Post by TAC109 »

On Thu, 16 Aug 2012 10:52:00 PDT, DrTeeth
wrote:

>Not at all. Just to clarify, are you saying that running two instances
>in parallel will produce more of a benefit if the amount of data being
>backed up is smaller? That what it seems like to me.

I'm not sure why you think there would be no time-saving, running diff
backups in parallel, using hash files (created when the full backup
was made).

Looking at some recent backups made on my 1 disk laptop I see:
Full backup file size 33GB
Diff backup file size 1GB
Hash file size of the full backup 0.3GB

Assuming 50% compression of the full image that means that the HD
being imaged had approx. 66GB of data.

If I had this 66GB of data split over 2 physical disks, 33GB each,
then, because the 2 disks can do I/O in parallel with little
interference with each other, the data can be read roughly twice as
fast.

Because the hash file contains hashes of each block that was backed up
when the full backup was taken, IFW doesn't need to read back the full
backup file when doing the diff backup. It will just compare the
calculated checksums with those in the hash file to determine if a
particular block on the disk has changed since the full backup was
taken and, if so, write the changed block to the Diff file.

The time taken to read the hash file and write the Diff file is
largely insignificant compared to the time taken to read the two 33GB
disks, therefore being able to do these Diff backups in parallel will
be almost as twice as fast as doing them one after the other.

Cheers
DrTeeth
Posts: 1289
Joined: Fri Aug 12, 2011 6:58 pm

Re: Running multiple instances of IFW

Post by DrTeeth »

"because the 2 disks can do I/O in parallel with little interference with each other"

I'd say this is not correct. Reading from one disk would saturate the i/o path to some extent, so if another stream would be added, it would have to compete for resources. Also, the size of the data being backed up is irrelevant. There will be a limit to the amount of data that can be transferred per unit time and that is independent of the total data to be transferred - just as a pipe has a maximum flow rate which is not affected by the total amount of fluid to be transferred.
DrTeeth
Posts: 1289
Joined: Fri Aug 12, 2011 6:58 pm

Re: Running multiple instances of IFW

Post by DrTeeth »

On Thu, 16 Aug 2012 20:02:00 PDT, just as I was about to take a herb,
Tom Cole disturbed my reverie and
wrote:

>If I had this 66GB of data split over 2 physical disks, 33GB each,
>then, because the 2 disks can do I/O in parallel with little
>interference with each other, the data can be read roughly twice as
>fast.

They may do that, but I doubt it. How can you confirm that one disk
does not saturate the data path. Also, what device are you restoring
to? If a single decide, and your premise above is correct, writing to
a single device will slow down the data flow - data will only flow as
fast as the slowest path.

>Because the hash file contains hashes of each block that was backed up
>when the full backup was taken, IFW doesn't need to read back the full
>backup file when doing the diff backup. It will just compare the
>calculated checksums with those in the hash file to determine if a
>particular block on the disk has changed since the full backup was
>taken and, if so, write the changed block to the Diff file.
>
>The time taken to read the hash file and write the Diff file is
>largely insignificant compared to the time taken to read the two 33GB
>disks, therefore being able to do these Diff backups in parallel will
>be almost as twice as fast as doing them one after the other.

The data written or read per unit time is NOT a function of the amount
of data as you seem to be suggesting - it will be a constant. Whatever
savings would be made will be made across the board. A pipe will have
a maximum flow rate - that cannot be modified by changing the total
amount of fluid to be transported.
--

Cheers

DrT
______________________________
We may not be able to prevent the stormy times in
our lives; but we can always choose to dance
in the puddles (Jewish proverb).
TAC109
Posts: 273
Joined: Tue Sep 06, 2011 10:41 pm

Re: Running multiple instances of IFW

Post by TAC109 »

On Fri, 17 Aug 2012 00:29:05 PDT, DrTeeth wrote:

>"because the 2 disks can do I/O in parallel with little interference with each other"
>
>I'd say this is not correct. Reading from one disk would saturate the i/o path to some extent, so if another stream would be added, it would have to compete for resources. Also, the size of the data being backed up is irrelevant. There will be a limit to the amount of data that can be transferred per unit time and that is independent of the total data to be transferred - just as a pipe has a maximum flow rate which is not affected by the total amount of fluid to be transferred.
>
If you *must* use pipe analogies, you're ignoring the diameter of the
pipe and how many of them there are.

A narrow pipe represents each HD, one pipe per HD (the mechanical
aspects make them very slow). Each of those pipes feeds into a larger
diameter pipe.

Think of a sewer. If many people flush at the same time, the system
doesn't get overloaded. The smaller individual pipes feed into larger
diameter pipes which can take the load.

If you have a system with 2 physical hard disks, you could easily find
out the effect they have on each other throughput-wise.

Cheers

"Life is like a sewer. What you get out of it depends on what you put
into it." - Tom Lehrer
DrTeeth
Posts: 1289
Joined: Fri Aug 12, 2011 6:58 pm

Re: Running multiple instances of IFW

Post by DrTeeth »

On Sat, 18 Aug 2012 15:25:07 PDT, just as I was about to take a herb,
Tom Cole disturbed my reverie and
wrote:

>If you *must* use pipe analogies, you're ignoring the diameter of the
>pipe and how many of them there are.

No, my point is that they are a constant in any given case as is the
max possible data flow. This cannot be changed. Your scenario will
work if a) you are not saturating your I/O path on either drive as you
surmise and b) that if you are writing to one drive that input is not
the limiting factor or c) you are writing to two drives and their
input is not the limiting factor.

I have three disks in this PC, and when one disk is copying at full
crack, the PC's responsiveness is affected - that tells me that there
is not enough I/O capacity for another copy operation to be run
concurrently to save any time.

Thanks for sticking with this. I am really enjoying the mental sparing
and I am not being deliberately argumentative, honest .
--

Cheers

DrT
______________________________
We may not be able to prevent the stormy times in
our lives; but we can always choose to dance
in the puddles (Jewish proverb).
Post Reply