Task: Camera Simulator
Post about a week effort at creating
a needed mulithreaded tool.
a needed mulithreaded tool.
For the last week got an interesting task - to create a Camera Simulator tool. Yes, it is thrilling when one gets the opportunity to learn something new, and create it from scratch, is not it ? Which, sadly enough, does not happen too often. Anyway, because of the tight schedule and the project running short on money, there is the risk of wasting the effort and not getting the desired result. And that made me extra motivated.
As I mentioned, the project is nearing its end, with expected customer integration in Dallas, TX beginning of next month (June.2009). The goal of the project, without going into too much detail, is to reach a certain speed and successful rate when analysing the scanned images of letters like this one (scaled down and information in it blurred) :
Yeah, quickly and correctly finding the features like: bar codes, stamps, address location and letter orientation, is enormous task. But Siemens, as the leader in the Postal Automation solutions, has longstanding experience and the German colleagues pull up a pretty amazing job at it. And to test the project before going for integration in Dallas - we need Camera Simulator, which can feed loads of images. That's where the need for this task started.
Camera Simulator is located on computer separate from the computer where the Image Analysis takes place. They are connected with two separate fast camera link cables allowing for the speedy transfer of the images of the front and back sides of the letter :
The Camera Simulator, a 4 CPU Computer, has two EDT frame grabber boards that can inject the letters (loaded from the local hard disk) through the camera link cables to the Image Analysis Computer. EDT also provides the drivers to operate the frame grabbers, run time dll and C++ API library for linking into the program. It has a large but slow USB drive and fast but relatively small internal drive. And the goal is to inject the images, each around 8 MB when decompressed (4 compressed) at 15 per second. Quick calculation shows that 240 MB of data has to be churned around efficiently and reliably every second. Cool !
First had to get comfortable using the EDT hardware through the provided API. Luckily an example from an experienced colleague and EDT specialist was available, along with a nice class with functions for initializing, loading the image into the frame grabber buffer and injecting this buffer. Then had to find out what is the maximal performance that could be achieved. The total performance of any system depends on (a) its ability to get its input from the outside world, (b) to process the information and (c) output back the result. So the bottle neck could be in the speed of the disk transfer(a), the speed of the CPUs(b) or the speed of the frame grabbers(c).
(a) My smart ass quickly realized that the hard disk does not allow for parallel processing. There is no point in trying to use several threads to pre load a number of images ahead of their usage. You actually get a performance degradation, because the hard disk head goes crazy going back and forth reading from different places on its electromagnetic surfaces. It is more efficient simply to let it continuously read one complete image, before going to the next.
(b) The 4 CPU cores allow for parallel processing of the images - decompressing, preparing in the standard internal format, and copying them into the frame grabber buffers. Which is kinda nice, so quickly is decided to summon several threads to help.
(c) The frame grabbers, like the HDD, allows only for a single thread usage of its API functions. The plus side is that there are two of them. So they can be utilized to work in parallel.
Having this in mind, the design of the program is ready:
(1) Configurable number of "Image Processing" threads load the images for the both sides of a letter from disk. Then decompress it and store it into a queue. As mentioned before, it is not a good idea to access files simultaneously. So the code that loads the images from the disk is protected with Windows Critical Section (a very simple and nice API for mutual exclusion).
(2) The "Main" thread gets a notification when the queue from the "Image Processing" threads is not empty. Based on the speed control parameter, notifies the "Injection Thread 1" to start working.
(3) When the "Injection Thread 1" is notified that a new buffer is available: a) loads the buffers into Frame Grabber 1; b) Notifies "Injection Thread 2" ; c) and injects (i.e. sends it over the wire).
(4) The "Injection Thread 2" does the same as in step (3), and then frees the memory.
Finally over the weekend utilized 2 systems to process several million letters for memory leaks and stability. Would it have been better off to have a separate thread to do just the loading of the images ?
Yeah, quickly and correctly finding the features like: bar codes, stamps, address location and letter orientation, is enormous task. But Siemens, as the leader in the Postal Automation solutions, has longstanding experience and the German colleagues pull up a pretty amazing job at it. And to test the project before going for integration in Dallas - we need Camera Simulator, which can feed loads of images. That's where the need for this task started.
Camera Simulator is located on computer separate from the computer where the Image Analysis takes place. They are connected with two separate fast camera link cables allowing for the speedy transfer of the images of the front and back sides of the letter :
The Camera Simulator, a 4 CPU Computer, has two EDT frame grabber boards that can inject the letters (loaded from the local hard disk) through the camera link cables to the Image Analysis Computer. EDT also provides the drivers to operate the frame grabbers, run time dll and C++ API library for linking into the program. It has a large but slow USB drive and fast but relatively small internal drive. And the goal is to inject the images, each around 8 MB when decompressed (4 compressed) at 15 per second. Quick calculation shows that 240 MB of data has to be churned around efficiently and reliably every second. Cool !
First had to get comfortable using the EDT hardware through the provided API. Luckily an example from an experienced colleague and EDT specialist was available, along with a nice class with functions for initializing, loading the image into the frame grabber buffer and injecting this buffer. Then had to find out what is the maximal performance that could be achieved. The total performance of any system depends on (a) its ability to get its input from the outside world, (b) to process the information and (c) output back the result. So the bottle neck could be in the speed of the disk transfer(a), the speed of the CPUs(b) or the speed of the frame grabbers(c).
(a) My smart ass quickly realized that the hard disk does not allow for parallel processing. There is no point in trying to use several threads to pre load a number of images ahead of their usage. You actually get a performance degradation, because the hard disk head goes crazy going back and forth reading from different places on its electromagnetic surfaces. It is more efficient simply to let it continuously read one complete image, before going to the next.
(b) The 4 CPU cores allow for parallel processing of the images - decompressing, preparing in the standard internal format, and copying them into the frame grabber buffers. Which is kinda nice, so quickly is decided to summon several threads to help.
(c) The frame grabbers, like the HDD, allows only for a single thread usage of its API functions. The plus side is that there are two of them. So they can be utilized to work in parallel.
Having this in mind, the design of the program is ready:
(1) Configurable number of "Image Processing" threads load the images for the both sides of a letter from disk. Then decompress it and store it into a queue. As mentioned before, it is not a good idea to access files simultaneously. So the code that loads the images from the disk is protected with Windows Critical Section (a very simple and nice API for mutual exclusion).
(2) The "Main" thread gets a notification when the queue from the "Image Processing" threads is not empty. Based on the speed control parameter, notifies the "Injection Thread 1" to start working.
(3) When the "Injection Thread 1" is notified that a new buffer is available: a) loads the buffers into Frame Grabber 1; b) Notifies "Injection Thread 2" ; c) and injects (i.e. sends it over the wire).
(4) The "Injection Thread 2" does the same as in step (3), and then frees the memory.
Finally over the weekend utilized 2 systems to process several million letters for memory leaks and stability. Would it have been better off to have a separate thread to do just the loading of the images ?
Comments
Post a Comment