سلام
من با برنامه زیر یک فایل دیگه حاوی 21 میلیون آدرس دامنه (که تعداد دامنه com شانسی است) ایجاد کردم.
کد PHP:
<?php
$chars = implode("", array_merge(range('a', 'z'), range('A', 'Z'), range('0', '9')));
$tlds = array('com', 'net', 'org', 'ir');
$file = fopen('domains.txt', 'a');
for($x = 0; $x < 21000000;$x++) {
$length = rand(5,15);
$tld = $tlds[rand(0, count($tlds) - 1)];
$chars = str_shuffle($chars);
fwrite($file, substr($chars, 0, $length).'.'.$tld."\n");
}
fclose($file);
حجم فایل نهایتا 309.7 مگابایت شد.
با برنامه زیر آدرس های غیر com رو جدا کردم:
کد PHP:
<?php
$source = fopen('domains.txt', 'r');
$dest = fopen("domains-filtered.txt", 'w');
while (($line = fgets($source, 128)) !== false) {
if (substr(rtrim($line), -4) != ".com") {
fwrite($dest, $line);
}
}
fclose($source);
fclose($dest);
این برنامه توی 44.97 ثانیه روی سیستم من اجرا شد و حجم فایل خروجی 231 مگابایت شد:
کد:
➜ Desktop /usr/bin/time -v php b.php
Command being timed: "php b.php"
User time (seconds): 12.17
System time (seconds): 32.01
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:44.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 29392
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2015
Voluntary context switches: 4
Involuntary context switches: 1069
Swaps: 0
File system inputs: 8
File system outputs: 451232
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
----------------------
هرچند که این سوال استاتر نبود اما برای خودم جالب بود که این بار همون فایل رو با یک زبان سطح پایینتر مقایسه کنم بنابراین من یکبار دیگه همون فایل قبلی رو با برنامه ی c پردازش کردم:
کد:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void){
char * line = NULL;
size_t len = 0;
ssize_t read;
char tld[5];
FILE * source = fopen("./domains.txt", "r");
if (source == NULL)
exit(EXIT_FAILURE);
FILE * dest = fopen("./domains-filtered.txt", "w");
if (dest == NULL)
exit(EXIT_FAILURE);
while ((read = getline(&line, &len, source)) != -1) {
memcpy( tld, &line[read - 5], 4);
tld[4] = '\0';
if (strcmp(tld, ".com")) {
fwrite(line, 1, read, dest);
}
}
fclose(source);
fclose(dest);
if (line)
free(line);
exit(EXIT_SUCCESS);
}
این بار برنامه توی 2.61 ثانیه اجرا شد (حدودا 17 برابر سریعتر)
کد:
➜ Desktop /usr/bin/time -v ./a.out
Command being timed: "./a.out"
User time (seconds): 1.34
System time (seconds): 0.23
Percent of CPU this job got: 60%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.61
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1244
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 64
Voluntary context switches: 6
Involuntary context switches: 114
Swaps: 0
File system inputs: 0
File system outputs: 451224
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0