Thanks for reading!
It’s tough to say anything concrete without seeing the code. But these are some things that I would check.
- See whether you are running out of memory. Not too sure since you have 32 gigs, but it would depend on the size of the files.
- If the bottleneck is definitely the json conversion, then multithreading won’t help much. Since the conversion is cpu-bound, you are essentially adding more overhead by using multithreading. I’m not sure why there’s a burst of cpu usage at the start which then tapers off. Maybe it is the time period in which the files are loaded into memory. Then, when the conversion starts (CPU-bound), it makes sense that your CPU utilization is less than a single core, because of the GIL.
- Multiprocessing shouldn’t have that issue. You should be able to perform conversions concurrently on the cores. Are all your files tiny? Sometimes, the overhead can outweigh the benefits.
- The unbalanced time slicing may be due to how the OS handles threads and/or due to variations in the size of the files.
You could try loading the files into memory (as much as would fit) and then compare the times taken to process them with and without multiprocessing. Unless something is going wrong, multiprocessing should definitely speed up the conversion.