|
|
|
To track file modification I suggest using stat()
You can store the modification time (st_mtime) and I guess it's _a lot_ faster than doing unneccessary read() over a file.
Try to benchmark the two approaches and let me know 
Lawrence Oluyede |
Homepage |
07/03/17 - 11:28 am | #
|
|
ps. http://docs.python.org/lib/os-fi...r.html#l2h-
2698
Lawrence Oluyede |
Homepage |
07/03/17 - 11:29 am | #
|
|
In my new changes (which persist the data structures) - guess how it tells if a file has changed ? 
I can't just drop it int o the current code - because I would have to 'store' the modification time and there is no obvious place to do that.
Fuzzyman |
Homepage |
07/03/17 - 12:27 pm | #
|
|
Dang, I wrote a big comment for this post the other day, but apparently it is lost to the aether - I must have hit the wrong button.
It made the rather superficial connection between the situation you describe, and the more general case of toolchains which keep output files up to date by processing only the subset of input files which have been modified, as exemplified by source code and compilers, and as ubiquitously solved using 'make'.
As memory serves, make even has a command line switch to perform the sort of parallelisation that you describe, albeit at the process level, rather than by thread. A make-based solution would obviously require some sort of intermediate output files to keep track of inter-document references as you mention, but that's not an entirely unpleasant alternative to processing every document every time.
Jonathan Hartley |
Homepage |
07/03/20 - 2:57 pm | #
|
|
|
Commenting by HaloScan
|