









|
[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
[NMLUG] Python tight loop causing massive CPU barfage
I used the following python code to open 175,000 mail
files, read the first 1024 lines from each file and search
for a line beginning with "Subject" from each.
It took 2 minutes 24 seconds on a 800MHz Pentium III
running SuSE 9.0:
real 2m24.097s
user 2m10.270s
sys 0m9.530s
Here's the read/search loop for each file:
myre = re.compile('^Subject')
for fn in filelist:
mode = os.stat(fn)[stat.ST_MODE]
if stat.S_ISREG(mode):
mf_fd = os.open( fn, os.O_RDONLY )
the_file = os.fdopen(mf_fd, "r", 1024)
the_lines = the_file.readlines(1024)
for line in the_lines:
m = myre.match(line)
the_file.close()
On Tue, 08 Feb 2005 20:44:17 -0700
Paul Tietjens <paul.tietjens@moriarty.k12.nm.us> wrote:
> I have a python script that essentially opens a few
>thousand (between 70,000 and 230,000 or so) files, reads
>the first 1024 bytes and looks for a string match.
>
> The goal is to search an entire partition full of
>Maildirs for specific emails.
>
> I want the process to happen as fast as possible. So
>far, it takes around 21 minutes - but there's a snag.
> While this script is running, every other process on the
>machine becomes sluggish to the point of
>nonresponsiveness.
>
> No amount of playing with nice and priority levels seems
>to help.
>
> What has helped, is a small sleep() in the loop - but
>that raises the amount of time taken to complete the
>tasks fairly rapidly (from 21 minutes to over an hour).
>
> In the end, I set up a goofy sort of throttling that
>alters the amount of time sleep()ing by the average load.
>
> Is there a better way to do this? I'm not much of a
>coder, and I know there are a couple on this list - so
>any tips offered, no matter how nebulous, would be great.
>
> Thanks in advance!
> _______________________________________________
> NMLUG mailing list
> NMLUG@nmlug.org
> http://www.nmlug.org/mailman/listinfo/nmlug
|
|