Large PAC file doesn't seem to work properly
I have three PAC files.
First, https://vindicar.name/proxy.pac (under 1KB)
Second, https://vindicar.name/proxy2.pac (1.9MB)
Third, https://vindicar.name/proxy3.pac (519KB)
They are almost identical in structure, performing a search through a list of IPs to determine is a proxy needs to be used. The only difference is that first has a tiny list of ips. the second has actual list of IPs, and the third has a significantly reduced list. The largest one was generated by a script, so I don't expect any sneaky syntax errors in there.
Smaller PAC files work just fine, directing requests through a private proxy. I tested it by adding whatismyipaddress.com to blacklisted domains and comparing its output to another "get my ip" site. The largest PAC file seems to have no effect whatsoever, with Firefox falling back to direct connection.
Is there any size and/or execution time restriction on PAC files?
Appendix 1: General structure of PAC files above:
function FindProxyForURL(url, host) { var ipblacklist = [ // List of IPs goes here, in decimal notation: // "127.0.0.1", // "192.168.0.1", // and so on ]; var domainblacklist = [ 'privoxy.org', 'whatismyipaddress.com', ]; function endsWith(str, suffix) { return str.indexOf(suffix, str.length - suffix.length) !== -1; } var blocked = false; for (var i=0; !blocked && (i<domainblacklist.length); i++) blocked = blocked || ((host == domainblacklist[i]) || endsWith(host, '.'+domainblacklist[i])); if (blocked || (ipblacklist.indexOf(dnsResolve(host)) != -1)) // That proxy is accessible via my VPN, you will have to substitute your own for testing return "PROXY 10.42.0.1:8118; SOCKS5 10.42.0.1:8118"; else return "DIRECT"; }
Wót Vindicar
Wubrane rozwězanje
Some thoughts: I’m not too skilled in this area, but it seems the number of "possible" IPs used for the variable in your rather large script causes the issue at some point. Things work fine for me when the last IP is on line 91136, but no longer for 91137 or more. It’s not clear to me whether this limitation is due to the number of possible IPs for the variable itself (I don’t think so), the dnsResolve(host) function or other parts of the script (or even the pac file’s size), but one of the latter seems more obvious. A quick thought: can you split the IP list into 2 or more sections and hence use 2 or more variables?
Do note that you will probably see the "PAC Execution Error: uncaught exception: out of memory []" error when opening the Browser Console (Ctrl+Shift+J), which even happens when the IP list does not exceed the value above. This presumably slows down the browser too and moreover, makes the script unreliable. Bugs have been reported for the memory issue, one was solved by increasing the heap size to 4 MB. Therefore any restriction may not be due to a certain amount of IPs or other value or function but simply the memory it consumes, so splitting the list may not work either. All together, I think people more familiar with pac files will tell you the file is set up in a way it shouldn’t, like here.
You may be familiar with this page on MDN - note the warning about carefully considering the use of the dnsResolve(host) function. Findproxyforurl.com is a nice reference too. In order to debug and optimize the script, there may be a way to let it throw an alert to indicate the lookup "error", or rather, memory consumption. I assume you thought of blocking subnets rather than separate IPs?
Hope this helps a bit.
Toś to wótegrono w konteksće cytaś 👍 1Wšykne wótegrona (4)
Try to ask advice at the Stack Overflow forum site.
Wubrane rozwězanje
Some thoughts: I’m not too skilled in this area, but it seems the number of "possible" IPs used for the variable in your rather large script causes the issue at some point. Things work fine for me when the last IP is on line 91136, but no longer for 91137 or more. It’s not clear to me whether this limitation is due to the number of possible IPs for the variable itself (I don’t think so), the dnsResolve(host) function or other parts of the script (or even the pac file’s size), but one of the latter seems more obvious. A quick thought: can you split the IP list into 2 or more sections and hence use 2 or more variables?
Do note that you will probably see the "PAC Execution Error: uncaught exception: out of memory []" error when opening the Browser Console (Ctrl+Shift+J), which even happens when the IP list does not exceed the value above. This presumably slows down the browser too and moreover, makes the script unreliable. Bugs have been reported for the memory issue, one was solved by increasing the heap size to 4 MB. Therefore any restriction may not be due to a certain amount of IPs or other value or function but simply the memory it consumes, so splitting the list may not work either. All together, I think people more familiar with pac files will tell you the file is set up in a way it shouldn’t, like here.
You may be familiar with this page on MDN - note the warning about carefully considering the use of the dnsResolve(host) function. Findproxyforurl.com is a nice reference too. In order to debug and optimize the script, there may be a way to let it throw an alert to indicate the lookup "error", or rather, memory consumption. I assume you thought of blocking subnets rather than separate IPs?
Hope this helps a bit.
First of all, thank you for your detailed and generally useful response.
Tonnes said
Do note that you will probably see the "PAC Execution Error: uncaught exception: out of memory []" error when opening the Browser Console (Ctrl+Shift+J), which even happens when the IP list does not exceed the value above.
I don't see this message in JS console. I have "Continuous logs" checkbox ticked, so it doesn't get cleared, but I only see messages generated by the site itself, and nothing PAC-related.
Tonnes said
You may be familiar with this page on MDN - note the warning about carefully considering the use of the dnsResolve(host) function.
If you check the code I listed, you will see that dnsResolve(host) gets called at most once - as a parameter to indexOf() method, rather than once per IP listed. That only happens if domains didn't match, so if I open whatismyipaddress.com, JS engine should optimize the call to indexOf() away, because blocked variable will have value of true already.
As far as I'm aware, in JS x = a || b should operate the same as if (a) x = true; else x = b;
Tonnes said
Some thoughts: I’m not too skilled in this area, but it seems the number of "possible" IPs used for the variable in your rather large script causes the issue at some point. Things work fine for me when the last IP is on line 91136, but no longer for 91137 or more. It’s not clear to me whether this limitation is due to the number of possible IPs for the variable itself (I don’t think so), the dnsResolve(host) function or other parts of the script (or even the pac file’s size), but one of the latter seems more obvious. A quick thought: can you split the IP list into 2 or more sections and hence use 2 or more variables?
The IP list comes from an external source. I changed the way IPs are encoded, transforming them into hexadecimal notation (e.g. 0x7F000001 for "127.0.0.1") and then using convert_addr() to do the same to IP of the target host.
This trick reduced filesize from 1.9MB down to 1.1MB, and PAC file started working again. I assume it's because storing and searching 100k+ integers is much easier than doing the same to 100k+ strings, even if there is not that much difference in terms of storage.
As I said, IP list comes from an external source, so if it keeps growing, my current solution will stop working. In that case I will have to merge IPs into subnets, and that's a non-trivial task if you want to do it sparingly and with as little "overhead" (IPs that weren't on the original list) as possible.
Wót Vindicar
Thanks for the feedback, good you solved it for now. I won’t go into details about the function or coding itself - you’ll probably do a better job in that than me. ;)
Vindicar said
I don't see this message in JS console. I have "Continuous logs" checkbox ticked, so it doesn't get cleared, but I only see messages generated by the site itself, and nothing PAC-related.
Make sure to check the Browser Console, not the Web Console, as you will only see the pac file warning there, along with a loading message and perhaps others. It may be good to watch for the error and use it as an indication for optimizing the script now or when it gets too large.
If it helps: the warning does not display for me when the last IP is on line 90900, but does for 91000. As this is close to the 91136 mentioned earlier, I assume the memory issue is therefor related / the cause.