scandalz.net
 
 
 
BETA (Google AJAX Search)

Perl

Perl is the greatest language ever.

On Fri, 5 Feb 2010 22:05:27 +0100
Gisle Aas <gisle@aas.no> wrote:

> http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding
> specifies how to pre-scan an HTML document to sniff the charset.
> Would it not be simpler to just implement the algorithm as specified
> instead of using a generic parser. The use of HTML::Parser to
> implement this sniffing was just me trying a shortcut since
> HTML::Parser seemed to implement a superset of these rules.

Those rules look somewhat involved to me, especially knowing that we already
have both XS and Pure Perl parsers at hand.

Two thoughts:

1. What about using HTML::Encoding, after adapting it so it has only
conditional dependency on HTML::Parser, and only uses HTML::Parser if available.
(It already tries several detection methods before getting to HTML::Parser):

http://search.cpan.org/~bjoern/HTML-Encoding/

A variation on this idea would for *it* to a pure Perl HTML parser
instead of skipping the HTML parsing check completely.

2. I note this from the spec page you reference:

"This algorithm is a willful violation of the HTTP specification, which requires that the encoding be assumed to be ISO-8859-1 in the absence of a character encoding declaration to the contrary, and of RFC 2046, which requires that the encoding be assumed to be US-ASCII in the absence of a character encoding declaration to the contrary. This specification's third approach is motivated by a desire to be maximally compatible with legacy content. [HTTP] [RFC2046]"

According to this, we can skip all this encoding-detection work and still be HTTP spec compliant (although it might more user-friendly to keep trying to guess. )

Mark


--
http://mark.stosberg.com/



http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding
specifies how to pre-scan an HTML document to sniff the charset.
Would it not be simpler to just implement the algorithm as specified
instead of using a generic parser. The use of HTML::Parser to
implement this sniffing was just me trying a shortcut since
HTML::Parser seemed to implement a superset of these rules.

--Gisle


I now have working code published which allows HTTP::Message to work without
the dependency on HTML::Parser. This is useful because it's a step towards
splitting out some of the HTTP modules into their own distribution which does
not have this dependency, which in turn depends on a C compiler. So, this
project could help allow parts of LWP to be used in places where a C compiler
is not available, or when it would be more convenient to distribute one code
line that could be used directly on multiple architectures.
( But this is not the only the use of HTML::Parser by the distribution.
LWP::UserAgent makes use of HTML::HeadParser which in turn uses HTML::Parser. )

My code is here:

http://github.com/markstos/libwww-perl/tree/remove-html-parser-dependency

The solution passes the numerous existing tests for charset detection, as well
as a new one I added.

However, I'm not yet recommending that the work be merged because the approach
is not clean.

Essentially I have have embedded a fairly full-featured Pure Perl HTML parser
into HTTP::Message. :) The code was taken from my fork of the
"HTML::Parser::Simple" project and specialized some for this case:

http://github.com/markstos/html--parser--simple

I think a cleaner approach would be to publish this Pure Perl HTML parser, and
then have an option to use it if HTML::Parser is not available.

A little history about HTML::Parser::Simple:

Ron Savage created the project based on the htmlparser.js JavaScript parser by
John Resig. This branch was not trying to be particurly compatible with
anything. It defines a new API. In particular, it bundles a parse tree
*consumer*, Tree::Simple, as well as parse tree producer.

I forked the project and made some incompatible changes to pursue a different
goal: Create a pure Perl HTML parser that is compatible with the HTML::Parser
API. Or specifically, I wanted emulate the HTML::Parser 2.x API enough so that
my parser could be used in place of it with HTML::FillInForm. My work met that
goal-- it can be used to pass all HTML::FillInForm tests with some minor
failures that I don't think matter.

This new case of parsing meta tags is another specialized use of the parser
that gives me another reason to publish the work.

Here's the problem: While I care about these specific goals for an HTML::Parser
that is "compatible enough", I'm not really interested in personally pursuing
the idea of a Pure Perl HTML::Parser that is 100% compatible with HTML::Parser
just for the sake of it. In short, I'm sure there will be change requests
beyond what the uses I care about, and I'm not interested in maintaining the
module to extend it for other uses.

There's also the matter of what to name it, since a version of
HTML::Parser::Simple already exists. There's always HTML::Parser::PP or
HTML::Parser::PurePerl, but those names just invite the idea that the goal is
to be 100% compatible with HTML::Parser.

I'll discuss the matter further with Ron Savage to get his thoughts.

Feedback on the topic from other LWP users is welcome.

Mark

--
http://mark.stosberg.com/




> > I took an interest in the issue of HTTP header ordering and researched
> > what several other Perl modules do in regards to this as well as Ruby's
> > Rack. I published the result on my blog:
> >
> > http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html
> >
> > The summary is that I support the option for unsorted headers in
> > HTTP::Headers. Michael Greb made a good case for it, and the
> > possibility for a performance improvement is attractive too.
>
> I would prefer if there was a way to make the sorted headers as fast
> as unsorted headers :-)

I have idea which might work for this, which is different than the
approaches used in HTTP::Headers::Fast. I can try some experiments
privately and report back if it turns out to be workable approach.

> Instead of introducing the 'as_string_without_sort' method could we
> achieve the same effect with a 'order' argument to 'as_string'? Could
> take values like 'sorted'/'original'/'dontcare'.

I think that would work equally well, and also allows for backwards
compatibility.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark@summersault.com Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .

On Tue, Jan 26, 2010 at 17:34, Mark Stosberg <mark@summersault.com> wrote:
>
> In 2008 there was some discussion about an option to preserve the
> ordering of HTTP headers. Part of that thread is quoted below.
>
> The idea resurfaced in another form with the release of
> HTTP::Headers::Fast, which provided a method to get back the the
> headers unsorted. However, the motivation was different there--
> performance-- and the implementation as different as well. It returns
> headers in essentially random order instead the order in which which
> they were created or transmitted.
>
> I took an interest in the issue of HTTP header ordering and researched
> what several other Perl modules do in regards to this as well as Ruby's
> Rack. I published the result on my blog:
>
> http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html
>
> The summary is that I support the option for unsorted headers in
> HTTP::Headers. Michael Greb made a good case for it, and the
> possibility for a performance improvement is attractive too.

I would prefer if there was a way to make the sorted headers as fast
as unsorted headers :-)

I still would like to see support for the ordering of headers
preserved at some point.

Instead of introducing the 'as_string_without_sort' method could we
achieve the same effect with a 'order' argument to 'as_string'? Could
take values like 'sorted'/'original'/'dontcare'.

--Gisle


> On Sun, 7 Sep 2008 15:53:46 +0200
> "Gisle Aas" <gisle@aas.no> wrote:
>
>> On Sun, Sep 7, 2008 at 1:49 PM, Michael Greb <mgreb@linode.com> wrote:
>> > On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote:
>> >>
>> >> True; and in this case we need to define what happens when fields are
>> >> modified with 'push', 'set' or 'init' and 'remove' as that's the API
>> >> that modify stuff.  Let me suggest the following definition of the
>> >> behaviour:
>> >>
>> >> - 'push' always append the field at the end of all headers.  multiple
>> >> occurrences of a field name do not have to be consecutive.
>> >>
>> >> - 'init' either does nothing or it works like 'push'.
>> >>
>> >> - 'remove' will always remove all concurrences of a field.
>> >>
>> >> - 'set' will work like 'push' if no other occurrence of the field exists.
>> >>
>> >> - 'set' will update the first occurrence if the field exists (and
>> >> remove all other occurrences).  if multiple field values is provided
>> >> with 'set' they are basically all injected at the location of the
>> >> first existing value.
>> >
>> >
>> > On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong:
>> >>
>> >> I think it makes sense to be able to enable them separately.
>> >> Suggested interface:
>> >>
>> >>  $h->scan(\&cb, original_order => 1, original_case => 1);
>> >>  $h->as_string(eol => "\n", original_order => 1, original_case => 1);'
>> >
>> > The attached patch uses the interface above and works towards the behavior
>> > outlined in the first message.  Due to the headers being stored as a hash,
>> > pushing does not currently preserve previous values, second and subsequent
>> > pushes of the same header will overwrite the previous value.  Supporting
>> > this would require a change in how the headers are stored within the module.
>> >  Your thoughts?
>>
>> I think it's better to just use your original approach and just keep
>> the representation like used to be with the addition of an array that
>> records the original field names and their order.  This should lead to
>> a smaller patch as the only thing that need to change is the code that
>> sets headers and the scan method.  I also like header lockups to be
>> efficient and the representation compact.
>>
>> > Server: Fool/1.0
>> > content-encoding: gzip
>> > Content-Type: text/plain; charset="UTF-8"
>> > Content-Encoding: base64
>> > Date: Fri Sep  5 10:24:37 CEST 2008
>> >
>> > Would be stored as (assuming push_header):
>>
>> My suggestion would be:
>>
>> bless {
>>     "content-encoding" => ["\n gzip", "base64"],
>>     "content-type" => "text/plain; charset=\"UTF-8\"",
>>     "date" => "Fri Sep  5 10:24:37 CEST 2008",
>>     "server" => "Fool/1.0",
>>     "::original_fields" => [
>>         "Server",
>>         "content-encoding",
>>         "Content-Type",
>>         "Content-Encoding",
>>         "Date",
>>     ],
>> }, "HTTP::Headers";
>>
>> The invariant that needs to hold is that there is the same number of
>> elements in {"::original_fields"} as there are values for all the
>> others keys.
>>
>> Pushing a value is trivial; only change from what we have now is
>> appending the original field name to {"::original_fields"}.
>>
>> The only state modification operation that becomes more complex is
>> setting of a value header value.  It has to:
>>
>>   - update the values in the hash as before
>>   - locate the first occurence of the field name in
>> {"::original_fields"}  => $idx
>>   - remove all other occurrences of the field name
>>   - splice(@{"::original_fields"}, $idx, 1, ($orig_field_name) x
>> $numbers_of_values_set);
>>
>> When 'scan' wants to iterate over the original headers it would have
>> to keep an index into the values array for each field that repeat.
>>
>> An more compact representation could be to store {"::original_fields"}
>> as a ":"-separated string; but we can think about that optimization
>> later.
>>
>> --Gisle
>>
>
>
> --
>  . . . . . . . . . . . . . . . . . . . . . . . . . . .
>   Mark Stosberg            Principal Developer
>   mark@summersault.com     Summersault, LLC
>   765-939-9301 ext 202     database driven websites
>  . . . . . http://www.summersault.com/ . . . . . . . .
>
>
>

I'm not comfortable with this patch. Some reasons:

- it's one big patch (actually there is 2)
- it does not follow the layout style in use, e.g. introduces cuddled elses
- it does various whitespace reformats
- introduces ways that are not optimizations
(like replacing $op eq "GET" with $op == $OP_GET)
- introduces unused functions, e.g. _header_push_no_return

I do like to see performance improvements, so I would not mind smaller
patches with demonstrated good effects. Having HTTP::Headers::Fast as
a benchmark to beat it good. I don't consider all the benchmarks in
[1] valid. I don't think the speed of pushing thousands of values
onto a header important. The speed of getting and setting single
values are.

I'm also slightly annoyed by the HTTP::Headers::Fast author for
copying my code and then claiming[2] he wrote it.

[1] http://github.com/markstos/p5-http-headers-fast/blob/master/tools/benchmark.pl
[2] http://search.cpan.org/dist/HTTP-Headers-Fast/lib/HTTP/Headers/Fast.pm#AUTHOR

--Gisle


On Tue, Jan 26, 2010 at 17:38, Mark Stosberg <mark@summersault.com> wrote:
>
> I've now published some patches in "git" that port the performance
> improvements of HTTP::Headers::Fast back to HTTP::Headers:
>
> http://github.com/markstos/libwww-perl/commits/http-headers-fast
>
> The changes benchmark to be 10 to 20% faster on average and pass all of
> the HTTP::Headers regression tests.
>
> As I just mentioned in a previous post, it also adds a new method to
> generate the headers in an unsorted order, for better performance. The
> behavior of "as_string()" is not changed.
>
>  Mark


On 1 Feb 2010, at 16:34, Stanisław T. Findeisen wrote:

> Dirk-Willem van Gulik wrote:
> > On 1 Feb 2010, at 14:54, Stanisław T. Findeisen wrote:
> >
> >> HTTPS_CA_FILE ...
> >
> > If I recall correctly (and this may be a few years out of date) - this only works if you are relying on Net::SSL as the underlying SSL library. It aint work with IO::Socket::SSL.
>
> Good. How to use it? I thought "use LWP::UserAgent;" does the job? : http://search.cpan.org/~dland/Crypt-SSLeay-0.57/SSLeay.pm

They bubble up through there; though the latter supports SSL_ca_file and SSL_ca_path.

I think that the structure under the covers is
Crypt::SSLeay
Net::SSL
or
Net::SSLeay
Net::SSLeay::Handle
IO::Socket::SSL
Net::Server::Proto::SSL

But I'll leave the proper perl heads to reply :)

Dw.

Dirk-Willem van Gulik wrote:
> On 1 Feb 2010, at 14:54, Stanisław T. Findeisen wrote:
>
>> HTTPS_CA_FILE ...
>
> If I recall correctly (and this may be a few years out of date) - this only works if you are relying on Net::SSL as the underlying SSL library. It aint work with IO::Socket::SSL.

Good. How to use it? I thought "use LWP::UserAgent;" does the job? : http://search.cpan.org/~dland/Crypt-SSLeay-0.57/SSLeay.pm

STF

http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA D63F DBF5 8AA8 3B31 FE8A


On 1 Feb 2010, at 14:54, Stanisław T. Findeisen wrote:

> HTTPS_CA_FILE ...

If I recall correctly (and this may be a few years out of date) - this only works if you are relying on Net::SSL as the underlying SSL library. It aint work with IO::Socket::SSL.

Thanks,

Dw.



http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Hello

How to restrict the set of CA certificates this library uses to validate server (peer) certificates?

I tried this simple program:

#!/usr/bin/perl

use warnings;
use strict;
use LWP::UserAgent;

$ENV{HTTPS_VERSION} = 3;
$ENV{HTTPS_DEBUG} = 1;
$ENV{HTTPS_CA_DIR} = '/var/log/';
$ENV{HTTPS_CA_FILE} = '/etc/ssl/certs/Wells_Fargo_Root_CA.pem';
print('LWP version: ' . ($LWP::VERSION) . "\n");

my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => 'https://sourceforge.net/account/login.php');
my $res = $ua->request($req);

print ("Status: " . ($res->status_line) . "\n");

if ($res->is_success) {
print ('issuer : ' . ($res->header('Client-SSL-Cert-Issuer')) . "\n");
print ('subject : ' . ($res->header('Client-SSL-Cert-Subject')) . "\n");
print ('cipher : ' . ($res->header('Client-SSL-Cipher')) . "\n");
}

but the output is:

LWP version: 5.813
SSL_connect:before/connect initialization
SSL_connect:SSLv3 write client hello A
SSL_connect:SSLv3 read server hello A
SSL_connect:SSLv3 read server certificate A
SSL_connect:SSLv3 read server done A
SSL_connect:SSLv3 write client key exchange A
SSL_connect:SSLv3 write change cipher spec A
SSL_connect:SSLv3 write finished A
SSL_connect:SSLv3 flush data
SSL_connect:SSLv3 read finished A
Status: 200 OK
issuer : /C=US/O=Equifax/OU=Equifax Secure Certificate Authority
subject : /C=US/O=sourceforge.net/OU=3754508056/OU=See www.geotrust.com/resources/cps (c)09/OU=Domain Control Validated - QuickSSL(R)/CN=sourceforge.net
cipher : RC4-MD5

If I, however, connect to a local site with self-signed certificate I get this:

SSL_connect:before/connect initialization
SSL_connect:SSLv3 write client hello A
SSL_connect:SSLv3 read server hello A
SSL3 alert write:fatal:bad certificate
SSL_connect:error in SSLv3 read server certificate B
SSL_connect:before/connect initialization
SSL_connect:SSLv2 write client hello A
SSL_connect:failed in SSLv2 read server hello A
Status: 500 SSL negotiation failed:

which is nice. So, it looks like these settings:

$ENV{HTTPS_CA_DIR} = '/var/log/';
$ENV{HTTPS_CA_FILE} = '/etc/ssl/certs/Wells_Fargo_Root_CA.pem';

are ineffective? I am setting $ENV{HTTPS_CA_DIR} to '/var/log/' so that it is set to something valid but with no certificates. Setting this to undef or skipping this line doesn't help.

What's wrong? (Wells_Fargo_Root_CA.pem doesn't look like Equifax.) Am I using Crypt::SSLeay? How to know that?

I know 5.813 is not the newest version but this is the one in the current Debian GNU/Linux distro...

STF

http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA D63F DBF5 8AA8 3B31 FE8A


Mark,

Thank you so much for your reply. I will do that.

Best wishes,
Yun-an
>Yun-an,
>
>The test failed was a live test, meaning it ran against a live website
>and could have failed for a reason caused by the server rather than
>your software. It is probably save to ignore. You can just for a "make
>install" to skip it.
>
> Mark

On Thu, 28 Jan 2010 15:51:47 +0530
bipin Nayak <nbipin78@gmail.com> wrote:

> Thanks for adding me to this group.
>
> Following is the script and result I am getting:-

Bipin,

I tried the script as you gave it and it gave the source of the login
page as the result, *not* a 301. Are you using the latest versions of
LWP::UserAgent and WWW::Mechanize? I just downloaded the latest
versions from CPAN.

Mark

--
http://mark.stosberg.com/



On Wed, 27 Jan 2010 12:47:39 +0100
Yun-an Yan <yun-an.yan@uni-rostock.de> wrote:

> Dear All,
>
> I cannot pass the test when I try to install Frameready-1.020.
> Would somebody please help me?

Yun-an,

The test failed was a live test, meaning it ran against a live website
and could have failed for a reason caused by the server rather than
your software. It is probably save to ignore. You can just for a "make
install" to skip it.

Mark

--
http://mark.stosberg.com/



Thanks for adding me to this group.

Following is the script and result I am getting:-



use WWW::Mechanize;

my $url='https://www.platts.com/Login.aspx?';

my $mechanize = WWW::Mechanize->new(autocheck => 1);

$mechanize->cookie_jar(HTTP::Cookies->new);

$mechanize->agent('Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1)
Gecko/20061010 (IKDhPmJcdw) Firefox/2.0');

my $response = $mechanize->get( $url );

my $html = $mechanize->content;

print "$html \n";



Output of script is:-



<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

<html><head>

<title>301 Moved Permanently</title>

</head><body>

<h1>Moved Permanently</h1>

<p>The document has moved <a
href="http://www.platts.com?<http://www.platts.com/?>
">here</a>.</p>

</body></html>



It should get source page of https://www.platts.com/Login.aspx?, But it is
saying error “301 moved permanently”.



Can anyone help me please on this regard, I tried a lot but not able to fix
this issue?



Thanks in advance!



Thanks & Regards,

Bipin Nayak* (Bipin)*

Dear All,

I cannot pass the test when I try to install Frameready-1.020.
Would somebody please help me?

$ perl -v

This is perl, v5.8.9 built for darwin-2level

$ uname -a
Darwin *.*.*.* 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01
PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

$ make test
/opt/local/bin/perl t/TEST 0
base/ua...........ok
html/form.........ok
local/autoload....ok
local/frames......ok
local/http........ok
local/protosub....ok
live/cpan.........Server closed connection without sending any data
back at /opt/local/lib/perl5/site_perl/5.8.9/Net/HTTP/Methods.pm line
345.
live/cpan.........dubious
Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-2
Failed 2/2 tests, 0.00% okay
Failed Test Stat Wstat Total Fail List of Failed
-------------------------------------------------------------------------------
live/cpan.t 255 65280 2 3 1-2
Failed 1/7 test scripts. 2/46 subtests failed.
Files=7, Tests=46, 6 wallclock secs ( 0.74 cusr + 0.15 csys = 0.89
CPU)
Failed 1/7 test programs. 2/46 subtests failed.
make: *** [test] Error 255


Sincerely,
Yun-an


I've now published some patches in "git" that port the performance
improvements of HTTP::Headers::Fast back to HTTP::Headers:

http://github.com/markstos/libwww-perl/commits/http-headers-fast

The changes benchmark to be 10 to 20% faster on average and pass all of
the HTTP::Headers regression tests.

As I just mentioned in a previous post, it also adds a new method to
generate the headers in an unsorted order, for better performance. The
behavior of "as_string()" is not changed.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark@summersault.com Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .



In 2008 there was some discussion about an option to preserve the
ordering of HTTP headers. Part of that thread is quoted below.

The idea resurfaced in another form with the release of
HTTP::Headers::Fast, which provided a method to get back the the
headers unsorted. However, the motivation was different there--
performance-- and the implementation as different as well. It returns
headers in essentially random order instead the order in which which
they were created or transmitted.

I took an interest in the issue of HTTP header ordering and researched
what several other Perl modules do in regards to this as well as Ruby's
Rack. I published the result on my blog:

http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html

The summary is that I support the option for unsorted headers in
HTTP::Headers. Michael Greb made a good case for it, and the
possibility for a performance improvement is attractive too.

Mark

On Sun, 7 Sep 2008 15:53:46 +0200
"Gisle Aas" <gisle@aas.no> wrote:

> On Sun, Sep 7, 2008 at 1:49 PM, Michael Greb <mgreb@linode.com> wrote:
> > On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote:
> >>
> >> True; and in this case we need to define what happens when fields are
> >> modified with 'push', 'set' or 'init' and 'remove' as that's the API
> >> that modify stuff. Let me suggest the following definition of the
> >> behaviour:
> >>
> >> - 'push' always append the field at the end of all headers. multiple
> >> occurrences of a field name do not have to be consecutive.
> >>
> >> - 'init' either does nothing or it works like 'push'.
> >>
> >> - 'remove' will always remove all concurrences of a field.
> >>
> >> - 'set' will work like 'push' if no other occurrence of the field exists.
> >>
> >> - 'set' will update the first occurrence if the field exists (and
> >> remove all other occurrences). if multiple field values is provided
> >> with 'set' they are basically all injected at the location of the
> >> first existing value.
> >
> >
> > On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong:
> >>
> >> I think it makes sense to be able to enable them separately.
> >> Suggested interface:
> >>
> >> $h->scan(\&cb, original_order => 1, original_case => 1);
> >> $h->as_string(eol => "\n", original_order => 1, original_case => 1);'
> >
> > The attached patch uses the interface above and works towards the behavior
> > outlined in the first message. Due to the headers being stored as a hash,
> > pushing does not currently preserve previous values, second and subsequent
> > pushes of the same header will overwrite the previous value. Supporting
> > this would require a change in how the headers are stored within the module.
> > Your thoughts?
>
> I think it's better to just use your original approach and just keep
> the representation like used to be with the addition of an array that
> records the original field names and their order. This should lead to
> a smaller patch as the only thing that need to change is the code that
> sets headers and the scan method. I also like header lockups to be
> efficient and the representation compact.
>
> > Server: Fool/1.0
> > content-encoding: gzip
> > Content-Type: text/plain; charset="UTF-8"
> > Content-Encoding: base64
> > Date: Fri Sep 5 10:24:37 CEST 2008
> >
> > Would be stored as (assuming push_header):
>
> My suggestion would be:
>
> bless {
> "content-encoding" => ["\n gzip", "base64"],
> "content-type" => "text/plain; charset=\"UTF-8\"",
> "date" => "Fri Sep 5 10:24:37 CEST 2008",
> "server" => "Fool/1.0",
> "::original_fields" => [
> "Server",
> "content-encoding",
> "Content-Type",
> "Content-Encoding",
> "Date",
> ],
> }, "HTTP::Headers";
>
> The invariant that needs to hold is that there is the same number of
> elements in {"::original_fields"} as there are values for all the
> others keys.
>
> Pushing a value is trivial; only change from what we have now is
> appending the original field name to {"::original_fields"}.
>
> The only state modification operation that becomes more complex is
> setting of a value header value. It has to:
>
> - update the values in the hash as before
> - locate the first occurence of the field name in
> {"::original_fields"} => $idx
> - remove all other occurrences of the field name
> - splice(@{"::original_fields"}, $idx, 1, ($orig_field_name) x
> $numbers_of_values_set);
>
> When 'scan' wants to iterate over the original headers it would have
> to keep an index into the values array for each field that repeat.
>
> An more compact representation could be to store {"::original_fields"}
> as a ":"-separated string; but we can think about that optimization
> later.
>
> --Gisle
>


--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark@summersault.com Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .


Re: content_charset and $_ by Gisle Aas

at 05:59 AM, 01/03/2010

On Jan 3, 2010, at 22:22 , Father Chrysostomos wrote:

> I came across a bug in HTTP::Message::content_charset. It has a ‘local $_’, which is unnecessary, since foreach loops already localise their topic. In fact, they do it in a safer way that is more like local *_=\do{my$x}. Using local $_ causes problems if $_ is tied. The attached patch removes the offending line and adds a test for it. This did actually occur in real code, and is not just a theoretical problem.
> <open_vsKSmGR8.txt>

Thanks. Applied as <http://github.com/gisle/libwww-perl/commit/0d33cd81894ed02f7005ca742206e90511338880>.

--Gisle

I came across a bug in HTTP::Message::content_charset. It has a ‘local
$_’, which is unnecessary, since foreach loops already localise their
topic. In fact, they do it in a safer way that is more like local *_=
\do{my$x}. Using local $_ causes problems if $_ is tied. The attached
patch removes the offending line and adds a test for it. This did
actually occur in real code, and is not just a theoretical problem.

On Wed, Nov 11, 2009 at 06:34:24PM -0800, Ilya Zakharevich wrote:
> A good example of the report of failure is
> http://www.nntp.perl.org/group/perl.cpan.testers/2009/11/msg5926497.html
> Essentially, the error message is
>
> Getting GP/PARI from ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/
> Cannot list (Illegal PORT command.
> ): at utils/Math/PariBuild.pm line 319.
>
> Can't fetch file with Net::FTP, now trying with LWP::UserAgent...
> Not in this directory, trying `ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/OLD/'...
> then it shows that the response via LWP for ..../OLD is empty (of type
> text/ftp-dir-listing), and shows that `ftp -pinegv' has no problem

I got the first non-Unix report with a similar failure. See
http://www.nntp.perl.org/group/perl.cpan.testers/2009/11/msg6061071.html

It goes like this: =======================================================

Getting GP/PARI from ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/
Not in this directory, now chdir('OLD')...
Can't use an undefined value as a symbol reference at C:/home/stro/perl5111/lib/Net/FTP/dataconn.pm line 54.

Can't fetch file with Net::FTP, now trying with LWP::UserAgent...
Not in this directory, trying `ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/OLD/'...
Can't fetch directory listing from ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/OLD/: 500 Can't use an undefined value as a symbol reference
Content-Type: text/plain
Client-Date: Thu, 19 Nov 2009 17:32:57 GMT
Client-Warning: Internal response

500 Can't use an undefined value as a symbol reference

=======================================================

Still puzzled,
Ilya

Hi,all.
Although I have just fixed out the problem.Finally,I need to active 3
webpage successively.
first,login:"theWebServer/main"
second,toShowTimeSlot:"theWebServer/ShowTimeslot?parameters=selected"
third,
toMakeBooking:"theWebServer/bookingRecord?parameters=selected&etc"
The first step is a must. What puzzled me is that I have to activate
the third step by a simple access to the second webpage, although I
have every parameter needed for the third web-request. I think maybe
it writes something to the cookie during the access.




---------------------------- Original Message ----------------------------
Subject: Re: UserAgent->get problem
From: "Keary Suska" <hierophant@pcisys.net>
Date: Sun, November 15, 2009 11:14 pm
To: "Libwww Perl" <libwww@perl.org>
--------------------------------------------------------------------------

On Nov 15, 2009, at 1:29 AM, SHANG Yuan wrote:

> get the window.open "url" from the second webpage(mentioned above),paste
> it to the web-browser, and get the same failure result as I get by perl
> script. But after I have clicked the "pic.gif" in the second webpage from
> the web-browser, I re-enter the "url" of the third webpage and this time
> the browser returned just what I wanted.
> It seems that the "click" action have activated something I haven't
noticed.
>
> Have anyone here come with the similar problems before?
> Any advice will be appreciated.

Have you tried setting the "Referer" header?

HTH,

Keary Suska


On Nov 15, 2009, at 1:29 AM, SHANG Yuan wrote:

> get the window.open "url" from the second webpage(mentioned above),paste
> it to the web-browser, and get the same failure result as I get by perl
> script. But after I have clicked the "pic.gif" in the second webpage from
> the web-browser, I re-enter the "url" of the third webpage and this time
> the browser returned just what I wanted.
> It seems that the "click" action have activated something I haven't noticed.
>
> Have anyone here come with the similar problems before?
> Any advice will be appreciated.

Have you tried setting the "Referer" header?

HTH,

Keary Suska

Thanks,Gisle.
My access to the web-server can divide into 3 steps:
1.login. Succeed.I get the same content using UserAgent as I login
through the web-browser.
2.ToABookingSystem Succeed. This second webpage contains some
javaScripts.And from the web-browser, it turns to the thrid webpage after
clicking one link. The following is the corresponding web-page code:

<a href="#" onClick="childWin =
window.open('request?someparameter=a&otherparameter=b&etc','childWin','width=390,height=350,scrollbars=yes,status=yes,resizable=yes')"><P
style="Font-Size:12px"><img src="../pic.gif"
border="0"></p></a></div></td></tr>


3.Submit my request. This third webpage comes from the second webpage. I
get the window.open "url" from the second webpage(mentioned above),paste
it to the web-browser, and get the same failure result as I get by perl
script. But after I have clicked the "pic.gif" in the second webpage from
the web-browser, I re-enter the "url" of the third webpage and this time
the browser returned just what I wanted.
It seems that the "click" action have activated something I haven't noticed.

Have anyone here come with the similar problems before?
Any advice will be appreciated.

Yuan SHANG


> On Sat, Nov 14, 2009 at 16:58, SHANG Yuan <spl@ust.hk> wrote:
>> I want to submit my request to a website, and I successfully parse the
>> "get" parameters.
>> When I paste the
>> url('http://mywebsite/search?aa=1&bb=2&JSEnable=true')
>> to firefox, it returns a successful value. However,
>> $response=$ua->get($url);
>> did not return the successful value.
>> Did any kind spirit help me figure it out?
>
> You need to provide more information about the actual server you try
> to connect to in order to get any real advice. My guess is that the
> server cares about the User-Agent or Cookie headers. Sniffing the
> traffic that the browser generates when talking to the server might be
> instructive.
>
> --Gisle
>
>
>>
>> By the way,the website need username and password.I have use the
>> statements below to deal with it. And It seems work well, since I can
>> get the content of the webpages without problems.
>>
>> my $response = $ua->post( $url,
>> [ loginID=>$username,
>> passwd => $passwd,
>> ]
>> );
>>
>>
>>
>


Re: UserAgent->get problem by Gisle Aas

at 16:01 PM, 11/14/2009

On Sat, Nov 14, 2009 at 16:58, SHANG Yuan <spl@ust.hk> wrote:
>  I want to submit my request to a website, and I successfully parse the
> "get" parameters.
>   When I paste the url('http://mywebsite/search?aa=1&bb=2&JSEnable=true')
> to firefox, it returns a successful value. However,
>     $response=$ua->get($url);
>   did not return the successful value.
> Did any kind spirit help me figure it out?

You need to provide more information about the actual server you try
to connect to in order to get any real advice. My guess is that the
server cares about the User-Agent or Cookie headers. Sniffing the
traffic that the browser generates when talking to the server might be
instructive.

--Gisle


>
>  By the way,the website need username and password.I have use the
> statements below to deal with it. And It seems work well, since I can
> get the content of the webpages without problems.
>
> my $response = $ua->post( $url,
>    [  loginID=>$username,
>         passwd => $passwd,
>    ]
>  );
>
>
>

UserAgent->get problem by SHANG Yuan

at 23:58 PM, 11/13/2009

Hi,all.
I want to submit my request to a website, and I successfully parse the
"get" parameters.
When I paste the url('http://mywebsite/search?aa=1&bb=2&JSEnable=true')
to firefox, it returns a successful value. However,
$response=$ua->get($url);
did not return the successful value.
Did any kind spirit help me figure it out?

By the way,the website need username and password.I have use the
statements below to deal with it. And It seems work well, since I can
get the content of the webpages without problems.

my $response = $ua->post( $url,
[ loginID=>$username,
passwd => $passwd,
]
);


No idea what the problem is offhand. I'll try to find some time to
debug this during the weekend.

--Gisle

On Thu, Nov 12, 2009 at 03:34, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> There are many "yellow" reports for Math::Pari in the smoke test
> database.  They come from failures to download C source code for the
> library Math::Pari need.  Net::FTP and LWP fail (kinda mysteriously);
> usual `ftp -pinegv' I run for debugging purposes succeeds.
>
> The recent versions of Math::Pari have some code to debug these
> failures; however, the debugging output says nothing to me (IIRC, most
> of the code to auto-fetch was contributed).  Any help is appreciated.
>
> Thanks,
> Ilya
>
> =======================================================
> A good example of the report of failure is
>  http://www.nntp.perl.org/group/perl.cpan.testers/2009/11/msg5926497.html
> Essentially, the error message is
>
>    Getting GP/PARI from ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/
>    Cannot list (Illegal PORT command.
>    ):  at utils/Math/PariBuild.pm line 319.
>
>    Can't fetch file with Net::FTP, now trying with LWP::UserAgent...
>    Not in this directory, trying `ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/OLD/'...
>
> then it shows that the response via LWP for ..../OLD is empty (of type
> text/ftp-dir-listing), and shows that `ftp -pinegv' has no problem
> getting the listings (for both directories) and the file.
>
> The code which emits these messages is in
>  http://cpansearch.perl.org/src/ILYAZ/Math-Pari-2.0304_00108060102/utils/Math/PariBuild.pm
>
> starting from
>  require Net::FTP;
>
>

There are many "yellow" reports for Math::Pari in the smoke test
database. They come from failures to download C source code for the
library Math::Pari need. Net::FTP and LWP fail (kinda mysteriously);
usual `ftp -pinegv' I run for debugging purposes succeeds.

The recent versions of Math::Pari have some code to debug these
failures; however, the debugging output says nothing to me (IIRC, most
of the code to auto-fetch was contributed). Any help is appreciated.

Thanks,
Ilya

=======================================================
A good example of the report of failure is
http://www.nntp.perl.org/group/perl.cpan.testers/2009/11/msg5926497.html
Essentially, the error message is

Getting GP/PARI from ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/
Cannot list (Illegal PORT command.
): at utils/Math/PariBuild.pm line 319.

Can't fetch file with Net::FTP, now trying with LWP::UserAgent...
Not in this directory, trying `ftp://megrez.math.u-bordeaux.fr/pub/pari/unix/OLD/'...

then it shows that the response via LWP for ..../OLD is empty (of type
text/ftp-dir-listing), and shows that `ftp -pinegv' has no problem
getting the listings (for both directories) and the file.

The code which emits these messages is in
http://cpansearch.perl.org/src/ILYAZ/Math-Pari-2.0304_00108060102/utils/Math/PariBuild.pm

starting from
require Net::FTP;

Greetings,

I am using a UserAgent with a callback to handle a streaming connection. How does one terminate the connection client-side ? How do you close the connection other than exiting the program ?

my $browser = LWP::UserAgent->new();

$response = $browser->get($surl, ':content_cb' => \&read_stream);

sub read_stream {...}

Thanks,

Sylvain

Bill Moseley schrieb:
> Now, if you for some reason really want to encode to latin1,

Actually, I need to encode to the charset (encoding) that is used in the
document fetched by LWP::UserAgent. I don't mind if that is UTF-8.:-)
But oftenly it is latin1.

Best regards,

Oliver Block

I'm trying to learn from this, too. Please (anyone) correct me if I'm wrong
below.

On Thu, Oct 15, 2009 at 11:18 AM, Oliver Block <lists@oliver-block.eu>wrote:

>
> I think I've found out what causes the problem. As I mentioned earlier
> the content of a td tag in my case "&raquo; Kontakt &nbsp;&rsaquo;
> Kontaktformular" will be represented by the following ... characters
> (?) "\x{bb} Kontakt \x{a0}\x{203a} Kontaktformular" and the reason seems
> to be that there is nothing like a character representation in the
> ISO-8859-1 encoding. The codepoint (for &rsaquo;) is U+203A or &#8250;
> This seems to be a legal character in ISO-8859-1-encoded html documents
> when it appears in the form of a character entity reference.
>

Well, I think you are slightly mixing things there. But, it's probably more
about terminology.

The 8 letters and symbols that make up "&rsaquo;" are all valid ISO-8859-1
code points. The character that it represents is not an ISO-8859-1
character. One point of the entity is to allow the browser to render
characters that are not in the encoding used to transmit the document from
the server to the browser.

What I think is happening in your case is when parsing the *entities* they
end up as wide characters so Perl has to promote the text to a wide
character -- that is it's setting the utf8 flag on the data so that Perl can
represent the *character*.

Now, this won't happen if you don't have entities (well entities that
represent wide characters). If, for example, you have just uft8 characters
in the web page you are parsing and don't decode it (which I consider a
programming error) then you won't end up with the utf8 flag on. That is, you
have octets instead of characters inside Perl.

And w/o the utf8 flag set you won't get "wide character in print" errors,
either, so you don't even know you are doing it wrong. ;)

So, what I think you should do is:

$octets = $response->content;
$tree->parse( decode_utf8( $octets ) ); # assuming you know it's utf8

But, HTTP::Response determines how to decode for you, so do:

$tree->parse( $response->decoded_content );

Now you have characters inside perl. Entites no longer exist -- just
characters.


> So, changing the parameter for as_HTML from
>
> $tree->as_HTML('<>&');
>
>
> to
>
> $tree->as_HTML();
>

That depends on what you want to do.

If you are creating a web page AND you set the encoding in the HTTP headers
to utf8 then it's this:

print encode_utf8( $tree->as_HTML( '<>&' ) );

Which says generate the HTML, and convert <,> and & to entities inside text
elements. No need to create entities for other characters (like &rsaquo;).

Now, if you for some reason really want to encode to latin1, then you are
right, you do this:

print encode( 'iso-8859-1', $tree->as_HTML, Encode::FB_CROAK );

The $tree->as_HTML will convert "unsafe" characters to entites that can be
represented in an Latin1.

But, I'd stick with encoding to utf-8. Decode and encode character data at
the edges of your program.




--
Bill Moseley
moseley@hank.org

Bill Moseley schrieb:
> So, in general, I would bring character data into Perl like:
>
> my $characters = $response->decoded_content;
>
> Then you work with $characters as needed.
>
> And then when you want to output you convert back to whatever encoding
> you need:
>
> $utf8_octets = encode_utf8( $characters );
>
> send_to_client( $utf8_octets );
>
> For your case you might try $tree->parse( $response->decoded_content
> ); Or, if you have raw utf-8 octets that you need to parse I think
> you can call $tree->utf8_mode( 1 ) to tell the parser to decode. But,
> I'd prefer the first.
>
That seems to be a good idea. There are only some modifications I have
to make, because there is not always the same encoding for incoming
documents. It can be latin1 or utf-8 or others. Those who create the web
pages are not always that precise. That's why HTML::Parser is such a
good choice in this cases, because it is tolerant.


I thought that not touching the encoding would be the best idea, but
decoding characters with code points higher than 255 seems to be better.
But it might also a good idea to use $response->decoded_content and
later encode the content again. At least if $response provides always
for an ->content_charset.

Thank you.

Best regards,

Oliver Block

Oliver Block schrieb:
> (You will find the perl code at the end)
>
> A close look to the dump of $tree and a comparison with
> $response->content showed the following:
>
> The following markup from $response->content
>
> <td colspan="8" align="left" bgcolor="#FFFFFF" class="Rubrik">&raquo;
> Kontakt &nbsp;&rsaquo; Kontaktformular</td>
>
> appears in tree as
>
> bless( {
> '_parent' =>
> $VAR1->{'_content'}[1]{'_content'}[0]{'_content'}[1]{'_content'}[5],
> '_content' => [
> "\x{bb} Kontakt \x{a0}\x{203a} Kontaktformular"
> ],
> 'colspan' => '8',
> 'align' => 'left',
> 'bgcolor' => '#FFFFFF',
> '_tag' => 'td',
> 'class' => 'Rubrik'
> }, 'HTML::Element' )
>
> If you have any idea how to avoid the conversion to utf8 and how to
> assure the the output of $tree->as_HTML() can be saved in the same
> encoding as stated in $response, please tell it.
>
>
I think I've found out what causes the problem. As I mentioned earlier
the content of a td tag in my case "&raquo; Kontakt &nbsp;&rsaquo;
Kontaktformular" will be represented by the following ... characters
(?) "\x{bb} Kontakt \x{a0}\x{203a} Kontaktformular" and the reason seems
to be that there is nothing like a character representation in the
ISO-8859-1 encoding. The codepoint (for &rsaquo;) is U+203A or &#8250;
This seems to be a legal character in ISO-8859-1-encoded html documents
when it appears in the form of a character entity reference.

So, changing the parameter for as_HTML from

$tree->as_HTML('<>&');


to

$tree->as_HTML();


solves the problem because now all "unsafe" characters (e.g. "\x{203a}")
are encoded as entities within as_HTML(). Therefore there is no need for
perl to encode the complete string to UTF-8 when using join() (see code
at the end). That's at least what perluniintro mentions:

"Internally, Perl currently uses either whatever the native eight-bit
character set of the platform (for example Latin-1) is, defaulting to
UTF-8, to encode Unicode strings. Specifically, if all code points in
the string are 0xFF or less, Perl uses the native eight-bit character
set. Otherwise, it uses UTF-8." (perldoc perluniintro)

That's at least how I make sense of it.

Best regards,

Oliver Block



> Oliver Block schrieb:
>
>> Hello everyone,
>>
>> the following code is used to load a web page from a certain web server
>> and parse it into an html tree. At the end a variable is assigned the
>> string representation of that tree.
>>
>> use LWP::UserAgent;
>> use HTML::TreeBuilder;
>>
>> my $ua = LWP::UserAgent->new;
>> my $response = $ua->get($form->{'url'});
>>
>> my $tree = HTML::TreeBuilder->new();
>> $tree->parse($response->content);
>>
>> # ...
>> # encoding of content of $tree is ISO-8859-1 at this point
>> $template = $tree->as_HTML('<>&');
>>
>> # encoding of content of $template is UTF-8
>>
>> Now the following problem arises. The encoding of the content of
>> $template (UTF-8) is not the same than the content of $tree
>> (ISO-8859-1). So it is obvious, that as_HTML converts the encoding to UTF-8.
>>
>> I debugged everything and everythings is fine up to the last line of code of sub HTML::Element::as_HTML which is:
>>
>> return join('', @html, "\n");
>>
>> This would mean that join seems to modify the encoding of the content.
>>
>> Any suggestions?
>>

On Wed, Oct 14, 2009 at 6:00 PM, Oliver Block <lists@oliver-block.eu> wrote:

>
> my $ua = LWP::UserAgent->new;
> my $response = $ua->get($form->{'url'});
>
> my $tree = HTML::TreeBuilder->new();
> $tree->parse($response->content);
>
> # ...
> # encoding of content of $tree is ISO-8859-1 at this point
> $template = $tree->as_HTML('<>&');
>
> # encoding of content of $template is UTF-8
>
> Now the following problem arises. The encoding of the content of
> $template (UTF-8) is not the same than the content of $tree
> (ISO-8859-1). So it is obvious, that as_HTML converts the encoding to
> UTF-8.
>

I'm not really sure what the problem is, sorry. But, the terminology above
seems a bit off.

UTF-8 and ISO-8859-1 are encodings (encoded octets) not characters.
Characters are an abstractions. You should use character's inside Perl and
encoded octets outside. (Ignore the fact that Perl's internal encoding is
UTF-8 and just pretend they are character abstractions.)

So, in general, I would bring character data into Perl like:

my $characters = $response->decoded_content;

Then you work with $characters as needed.

And then when you want to output you convert back to whatever encoding you
need:

$utf8_octets = encode_utf8( $characters );

send_to_client( $utf8_octets );

For your case you might try $tree->parse( $response->decoded_content ); Or,
if you have raw utf-8 octets that you need to parse I think you can call
$tree->utf8_mode( 1 ) to tell the parser to decode. But, I'd prefer the
first.

(One thing I'm not clear on is when or if the parsers detect encoding by
looking for a charset in the content. XML::LibXML will use the <?xml
encoding= from the content, for example. But I'm not clear if the
HTML::Parser will look at an encoding set in a <meta> tag.)






--
Bill Moseley
moseley@hank.org

(You will find the perl code at the end)

A close look to the dump of $tree and a comparison with
$response->content showed the following:

The following markup from $response->content

<td colspan="8" align="left" bgcolor="#FFFFFF" class="Rubrik">&raquo;
Kontakt &nbsp;&rsaquo; Kontaktformular</td>

appears in tree as

bless( {
'_parent' =>
$VAR1->{'_content'}[1]{'_content'}[0]{'_content'}[1]{'_content'}[5],
'_content' => [
"\x{bb} Kontakt \x{a0}\x{203a} Kontaktformular"
],
'colspan' => '8',
'align' => 'left',
'bgcolor' => '#FFFFFF',
'_tag' => 'td',
'class' => 'Rubrik'
}, 'HTML::Element' )

Last but not least a snippet of hexdump from the html markup
($response->content) above

00001f10 09 3c 74 64 20 63 6f 6c 73 70 61 6e 3d 22 38 22 |.<td
colspan="8"|
00001f20 20 61 6c 69 67 6e 3d 22 6c 65 66 74 22 20 62 67 |
align="left" bg|
00001f30 63 6f 6c 6f 72 3d 22 23 46 46 46 46 46 46 22 20
|color="#FFFFFF" |
00001f40 63 6c 61 73 73 3d 22 52 75 62 72 69 6b 22 3e 26
|class="Rubrik">&|
00001f50 72 61 71 75 6f 3b 20 4b 6f 6e 74 61 6b 74 20 26 |raquo;
Kontakt &|
00001f60 6e 62 73 70 3b 26 72 73 61 71 75 6f 3b 20 4b 6f
|nbsp;&rsaquo; Ko|
00001f70 6e 74 61 6b 74 66 6f 72 6d 75 6c 61 72 3c 2f 74
|ntaktformular</t|
00001f80 64 3e |d> |

I still do not understand why that happens but join does certainly not
cause it.

If you have any idea how to avoid the conversion to utf8 and how to
assure the the output of $tree->as_HTML() can be saved in the same
encoding as stated in $response, please tell it.

Best Regards,

Oliver Block


Oliver Block schrieb:
> Hello everyone,
>
> the following code is used to load a web page from a certain web server
> and parse it into an html tree. At the end a variable is assigned the
> string representation of that tree.
>
> use LWP::UserAgent;
> use HTML::TreeBuilder;
>
> my $ua = LWP::UserAgent->new;
> my $response = $ua->get($form->{'url'});
>
> my $tree = HTML::TreeBuilder->new();
> $tree->parse($response->content);
>
> # ...
> # encoding of content of $tree is ISO-8859-1 at this point
> $template = $tree->as_HTML('<>&');
>
> # encoding of content of $template is UTF-8
>
> Now the following problem arises. The encoding of the content of
> $template (UTF-8) is not the same than the content of $tree
> (ISO-8859-1). So it is obvious, that as_HTML converts the encoding to UTF-8.
>
> I debugged everything and everythings is fine up to the last line of code of sub HTML::Element::as_HTML which is:
>
> return join('', @html, "\n");
>
> This would mean that join seems to modify the encoding of the content.
>
> Any suggestions?
>
>
> Best Regards,
>
> Oliver Block
>
>
>

Hello everyone,

the following code is used to load a web page from a certain web server
and parse it into an html tree. At the end a variable is assigned the
string representation of that tree.

use LWP::UserAgent;
use HTML::TreeBuilder;

my $ua = LWP::UserAgent->new;
my $response = $ua->get($form->{'url'});

my $tree = HTML::TreeBuilder->new();
$tree->parse($response->content);

# ...
# encoding of content of $tree is ISO-8859-1 at this point
$template = $tree->as_HTML('<>&');

# encoding of content of $template is UTF-8

Now the following problem arises. The encoding of the content of
$template (UTF-8) is not the same than the content of $tree
(ISO-8859-1). So it is obvious, that as_HTML converts the encoding to UTF-8.

Is this behavior of as_HTML known? Will this be changed?

Best Regards,

Oliver Block

Hello everyone,

the following code is used to load a web page from a certain web server
and parse it into an html tree. At the end a variable is assigned the
string representation of that tree.

use LWP::UserAgent;
use HTML::TreeBuilder;

my $ua = LWP::UserAgent->new;
my $response = $ua->get($form->{'url'});

my $tree = HTML::TreeBuilder->new();
$tree->parse($response->content);

# ...
# encoding of content of $tree is ISO-8859-1 at this point
$template = $tree->as_HTML('<>&');

# encoding of content of $template is UTF-8

Now the following problem arises. The encoding of the content of
$template (UTF-8) is not the same than the content of $tree
(ISO-8859-1). So it is obvious, that as_HTML converts the encoding to UTF-8.

I debugged everything and everythings is fine up to the last line of code of sub HTML::Element::as_HTML which is:

return join('', @html, "\n");

This would mean that join seems to modify the encoding of the content.

Any suggestions?


Best Regards,

Oliver Block



Can anyone confirm this problem before the webpage starts to respond again.
On windows it fails after 22 seconds.On linux it fails after 189 seconds! Is this a linux problem or a perl on linux problem?Can something be done to avoid this long timeout?

On windows:
require LWP::UserAgent;my $ua = LWP::UserAgent->new;$ua->timeout(10);my $starttime = time;my $response = $ua->get('https://enet.kbcat.com/clients/JVTest/default.aspx');
if ($response->is_success) {   print "request sucess\n";}else {   print  $response->status_line . "\n";}my $endtime = time;print "time used: ".($endtime - $starttime)." seconds\n";
^D500 Connect failed: connect: Unknown error; Unknown errortime used: 22 seconds
On linux (CentOS release 5.3 (Final)):
require LWP::UserAgent;my $ua = LWP::UserAgent->new;$ua->timeout(10);my $starttime = time;my $response = $ua->get('https://enet.kbcat.com/clients/JVTest/default.aspx'); if ($response->is_success) {   print "request sucess\n";}else {   print  $response->status_line . "\n";}my $endtime = time;print "time used: ".($endtime - $starttime)." seconds\n";

500 Connect failed: connect: Connection timed out; Connection timed outtime used: 189 seconds
_________________________________________________________________
Keep your friends updated—even when you’re not signed in.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_5:092010

On Mon, Sep 21, 2009 at 09:34, Jesper Jørgen Persson
<jjperss@hotmail.com> wrote:
> require LWP::UserAgent;
> require HTTP::Request;
> $request = HTTP::Request->new(GET => 'http://www.utm.edu/departments/finearts/calen.htm');
> $ua = LWP::UserAgent->new();
> $response = $ua->request($request);
> $content_charset = $response->content_charset();
> ^D
> Can't use an undefined value as an ARRAY reference at C:/Perl/site/lib/HTTP/Message.pm line 251.
> Summary of my perl5 (revision 5 version 8 subversion 9)libwww version: 5.831

LWP ends up confused about the meta elements with empty content on
that page. The following fix will appear in 5.832:

--- a/lib/HTTP/Message.pm
+++ b/lib/HTTP/Message.pm
@@ -248,6 +248,7 @@ sub content_charset
if (my $c = $attr->{content}) {
require HTTP::Headers::Util;
my @v = HTTP::Headers::Util::split_header_words($c);
+ return unless @v;
my($ct, undef, %ct_param) = @{$v[0]};
$charset = $ct_param{charset};
}


require LWP::UserAgent;
require HTTP::Request;
$request = HTTP::Request->new(GET => 'http://www.utm.edu/departments/finearts/calen.htm');
$ua = LWP::UserAgent->new();
$response = $ua->request($request);
$content_charset = $response->content_charset();
^D
Can't use an undefined value as an ARRAY reference at C:/Perl/site/lib/HTTP/Message.pm line 251.
Summary of my perl5 (revision 5 version 8 subversion 9)libwww version: 5.831
/Jesper Persson
_________________________________________________________________
Windows Live™: Keep your life in sync. Check it out!
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_t1_allup_explore_012009


Sorry for the noise!
I was not using the latest version of libwww.
After upgrading the decoded_content works perfectly on the page.
Thanks Gisle.
Best RegardsJesper Persson
> Date: Tue, 15 Sep 2009 17:08:49 +0200
> Subject: Re: decoded_content and page with content-type application/xhtml+xml; charset=utf-8
> From: gisle@aas.no
> To: jjperss@hotmail.com
> CC: libwww@perl.org
>
> On Tue, Sep 15, 2009 at 10:06, Jesper Jørgen Persson
> wrote:
>>
>> message->decoded_content doesn't decode this page:
>> http://www.silkeborgbibliotekerne.dk/om+bibliotekerne/kontakt/sp%c3%b8rg+bibliotekaren
>>
>> because the content-type is: application/xhtml+xmland decoded_content expects the content-type to start with "text"
>> from message.pm: if ($ct && $ct =~ m,^text/,,) {
>
> What version of LWP is this? The code does not look like that any more.
>
>> according to this page: http://www.w3.org/TR/xhtml-media-types/#application-xhtml-xmlthe content-type should be correct.
>> What should I do ? Manually do something like: $content_ref = \Encode::decode($charset, $$content_ref);
>
> My guess is that upgrading LWP will help.
>
> --Gisle

_________________________________________________________________
More than messages–check out the rest of the Windows Live™.
http://www.microsoft.com/windows/windowslive/

The Perl Review, Spring 2008

at 05:11 AM, 03/20/2008

Issue 4.2 (Spring 2008)

Table of Contents

Compiling My Own perl—brian d foy (first page)
FMTIEWTK About Closures—Johan Lodin (first page)
Expecting Perl—Mark Schoonover (first page)
Perl and Undecidability—Jeffrey Kegler (first page)
The Year in Perl, 2007—brian d foy (first page)

The Perl Review, Fall 2007

at 05:05 AM, 03/20/2008

Issue 4.0 (Fall 2007)

Table of Contents

Templating My Output—Alberto Manuel Simões (first page)
Making My Own CPAN—brian d foy (first page)
Programming Parrot—Jonathan Scott Duff (first page)
Komodo Test Drive—Jim Brandt (first page)
Named Captures in Perl 5.9.5—brian d foy (first page)

The Perl Review, Winter 2007

at 05:03 AM, 03/20/2008

Issue 4.1 (Winter 2007)

Table of Contents

Simple Web Access—Alberto Manuel Simões (first page)
Parrot Status Report—Jonathan Scott Duff (first page)
Mapping Op Codes—Eric Maki (first page)
CPANdeps—David Cantrell (first page)
HTML Slides—Grant McLean (first page)
Alter Egos—Anno Siegel (first page)

Perlcast interviews brian d foy about The Perl Review version 2.3.

Found Perl, now a Flickr Photo Pool

at 00:51 AM, 08/13/2007

Found Perl is section of The Perl Review's website where we posted pictures of Perl paraphernalia or the word "Perl" in the wild. We started it a long time ago, and were very slow in updating it when people would send in photos.

Now "Found Perl" is a photo pool on Flickr. Instead of sending them to TPR, just add them to the pool.

Schwern's Shirt, now on Flickr

at 00:46 AM, 08/13/2007

The Perl Review's website has had a section on Schwern's Shirt, the orange monstrosity that brian d foy bought at the charity auction for The Perl Foundation at the 2004 Open Source Convention.

Now we've moved the section of the website to a Flickr group for Schwern's shirt. This way, anyone can add their photos of Schwern's shirt to the group. Instead of being infrequently updated on the website, people can add them as soon as they upload them to Flickr.

If you don't have a Flickr account and don't want to create one, you can still send them to The Perl Review by mailing them to editors@theperlreview.com.

The Perl Review Date Format Challenge

at 16:56 PM, 04/04/2006

I just posted this challenge to Perlmonks, and it's the first TPR code challenge, I guess:


David Pogue had a momentary lapse of judgement when he proclaims in his blog that the date sequence 01:02:03 04/05/06 will only happen once in all of human history.

Besides the obvious gaffes of date formatting (which one is the month and which one is the year?), the red herring of leading zeros (to make the minute and second stand out), and so on, no one who's seen this has made the comment that calendars say whatever we want them to say and the numbers are only special because we set the calendar up that way in this one case. What about the Chinese, Hebrew, and Muslim calendars?

So this seems like a good challenge to publish in The Perl Review: using the Perl Date modules (or not, I guess), in how many different calendars and formats can you make this sequence? What else is special about those days (are they a weekend, fall on a full moon, have a solar eclipse, etc.)?


Read more about it on Perlmonks...

The Economics of Newsstands

at 00:35 AM, 03/27/2006

Powell's Technical Books in Portland (that's the one on 33 NW Park Avenue) is going to carry The Perl Review. It's our first newsstand distribution.

I had to set a newstand price. The deal basically works like this: bookstores keep most of the profit. Magazines make money when the single-issue buyers turn into subscribers. After Powell's cut, which we set at 40%, and my costs, $2 an issue, I have to figure out a price that also motivates people to give the money to The Perl Review directly instead of the book store. That's why you see big discounts for magazines when you subscribe: that's the real price, and everything else is markup. The Powell's price ends up being $5, which is 50 cents more than the subscription price.

That's not to say that newsstands are bad. It's like better-than-free advertising since it sits on the shelf and I cover my costs plus a little more for every issue sold.

Forget about absolute numbers for a moment. At my price point, if they sell 75% of the copies, I break even. That would be fine with me because any copy sitting on the actual magazine display means people see that issue. Some might subscribe later even if they don't buy it. Now that I have a price point, I have to figure out the right number of issues. That's something I just have to guess.

I left 16 copies of the Spring 2005 issue, but I also have to consider that I sold about 10 at the Intermediate Perl book signing. We'll see how that goes.

Now, a good magazine accountant has to keep track of the actual number of newsstand sales too. As much as I'd like to pretend that we sell every single copy, the Post Office wants to know where all the issues went to verfiy that we abide by all the periodical rules. It's not enough for the newsstand to simply tell me what they sold. They certainly aren't going to tell me they sold everything when they didn't since that's money out of their pocket. They can't really tell me they sold nothing because that's money out of my pocket.

If you're a late night person living in a city, you might have seen a bunch of guys tearing off the front pages of newspapers and magazines. Instead of sending back the unused copies, they send back the cover (and they do that for books too). Every cover they don't send back is a sale that they owe me money for.

You might think those unsold issues represent lost money, but they really don't. They are a sunk cost, meaning that I would have spent that money regardless of the sales. That starts way back at the printers when I have to decide how many issues I want. That number includes all subscriptions, complimentary copies, samples to user groups, and all the issues I'll need to fulfill orders for back issues. Not only that, but the more copies I print, the lower the incremental cost (the cost per each copy). Each printing job has a fixed overhead for the job preparation, machine set-up, and so on. That's the make ready. I end up printing many more copies than I need, partly to amoritize the make ready. Not selling at the newsstand is slightly better than not selling while sitting in boxes in the office. At least people see them at the newsstand.

Remember that magazines make money on subscriptions, so that's the goal. I don't care about selling more at the newsstands. If someone subscribes because they see an issue on the newsstand, the profit from the subscription pays for about three unsold newsstand copies, so five subscriptions from people seeing the issue at Powell's would make up for no sells. That's just breaking even, and nobody makes any money. That also means I'm spending $6 to get a new subscriber.

If you're already despondent, you don't want to read about distributors. Most bookstores don't want to deal with every individual publisher. They'd have to keep track of a separate deal for every magazine. Instead, they want to deal with a single source in the same way they deal with books. I know my costs, and I know the newsstands cut, and I have a price point that I can't change to much because people won't buy it at too high a price. If I use a distributor, perhaps to get into the big chain book stores, they are going to want a big cut too. I'll end up either breaking even or losing money on every newsstand copy, and I'll want to convert that to a subscription as soon as possible. That's why you see so many wonderful subscription cards in the magazines.

So far I've just talked about money from sales. We can also sell advertising, which we do for the special friends of the Perl community. Since magazines know they are going to lose money at the newsstand, they make up the difference with paid advertising. Ever wonder why magazines such as Wired are mostly advertisements? That's making up for the money they'll lose on the newsstands. Remember when I talked about keeping track of the number of copies sold? Advertisers want to know those numbers. They don't care how many copies the newsstand bought. They care about the number of copies that shoppers bought. That sets the rate at which the magazines can sell ads. More eyeballs equal more dollars. There's a separate industry of companies that audit magazines to verify the numbers. That's even more money that gets sucked away.

The short story? Subscribe to the magazines you like. It's the only way they can survive.

RSS for interviews

at 09:48 AM, 01/16/2006

Last year, I started including interviews with Perl people on The Perl Review website, but I didn't add a feed for that. Now I have and it's on RSS page.

It was actually Aristotle over at use.perl who pointed out the missing feed by scraping the site and creating his own.

The web site is actually a directory processed by Template Toolkit, so what I really need to do is add indexing support as ttree goes through the directories, then spit out pages as it does that. That sounds like a magazine article...

New book review news; guideline links

at 15:31 PM, 01/04/2006

Every issue I get a couple of book reviews that don't quite cut it. Techies tend to present too many sides of the story: rather than express their opinion, they equivocate by pointing out all the holes in their own opinion. In short, they are entirely too nice.

Being nice isn't a bad thing, but what people really want out a book review is a recommendation. "Should I buy this book?" It doesn't matter if people agree with you as long as you are fair to the book. After that, people want to know the particulars: who, what, when, where, and how.

So far, The Perl Review has taken reviews from several different people, and very few people have provided multiple reviews. That worked when we were first getting started, but now I think we need something different. Since book reviews are about opinions, and the reader doesn't have to agree with the reviewer, I think readers need to know the reputation and leaning of the reviewer to make their own decision. For instance, my wife doesn't agree with movie-reviewer Roger Ebert, but based on his negative reviews she knows which movies she will like. At the same time, she knows the certain people on LiveJournal likes the same sorts of movies she does, so she can trust them.

In line with all that, I think I'm going to move towards recurring book reviewers, and do more to establish their reputation. That also means that I want to get a couple of reviewers who think about books differently so I can give more readers someone that thinks like them. I've approached a couple of people, and we'll see how it turns out.

I've added two guides I hadn't run across previously:

I'm disappearing for a bit

at 01:30 AM, 12/01/2005

It seems that every time I put out a new issue I have to leave town for a couple weeks. I don't think the two are related. Travel is just too busy this year. Add to that finishing up the Alpaca book and I've had a pretty busy week. I'll be back in the middle of December.

The Perl Review 2.1 ready for download

at 01:21 AM, 12/01/2005

I'm sending out the email announcements as I post this. Once you get the new password, you'll be able to download the latest issue of The Perl Review.

In this issue:


  • Seven Sins of Perl OO Programming -- chromatic
  • Hash Anti-Patterns -- Alberto Manuel Simões
  • Haskell for Perlers -- Frank Antonsen
  • PerlWar -- Yanick Champoux
  • books reviews, commentary, news and more...


Some of you neglected to renew, but I won't bug you in email anymore. To find out more about the sampler on the cover, you'll have to renew that subscription.

If you haven't subscribed yet, now's the time because I have to raise prices next year when the US postal rates go up.

We now have a permanent link on Perl.com. We're way at the bottom, but that's okay.

I went looking for other mentions of TPR on O'Reilly sites and found that they have cited many of our book reviews.

We even have some of our book reviews cited on Amazon.com.

Reviewer opportunities

at 10:54 AM, 11/12/2005

We're looking for reviewers for Activestate's Komodo and OSoft's ThoutReader. Let us know if you're interested.

It's renewal time!

at 10:52 AM, 11/12/2005

The next issue of The Perl Review comes out in a couple of weeks, so it's time to renew if you've already finished your first year subscription!

This may seem odd for some who just recently subscribed, but with the magic of time travel we let you pretend that your subscription started much earlier by allowing you to choose which issue to start your subscription. You may have subscribed yesterday but selected to start with issue 1.0 (Winter 2004), meaning that your issue ran out with our last issue, 1.3 (Fall 2005).

I've found that most new subscribers like to get most of the back issues at first, and it's worked out rather well for us. Now that we're starting our second year, though, I have to figure out how to make that work so people don't have to renew right away. I'll have to add a two-year subscription, I think.

Renewing subscribers

at 18:44 PM, 08/21/2005

I've been reading lot about magazine renewal rates. A lot of publications seem to be happy to get 25% renewals, and they actually spend a fair bit of money to get that. Ever wonder why you get some many magazine renewal letters in your postal mail?

For TPR, since I don't have a lot of money, I simply used email. The target audience is certainly technology-adept, so that's not so bad. So far I've sent out three renewal emails. One was the week before OSCON, and I got about a 45% response for that. That's already good for magazines, but not quite the 98% renewal rate TPJ had in its first year. I sent out a second email two weeks after that, and another one over the weekend. I think I'm floating somewhere around 75% renewal right now.

I have a list of all of the people who haven't renewed (it's an SQL view ;), and a lot of them should have and I would be surprised if they don't. Although email provided me with a virtually free way to get that 75%, I also think email has a tendency to get lost. First, if the subscriber can't deal with it right awy, it joins the long queue of messages that get ignored forever. Second, it has a pretty good chance of being blocked by a spam filter. Third, some people might get so mucch mail that they just don't get to ever see it. I do send each mail individually since each contains a persoanl renewal code, so I'm not blasting out spam to any user at each domain.

I am considering sending out some postcards to the hold-outs. I figure that will cost me about $50, which means that I need about four renewals to make up for it. If I get four renewals, that's paid for. I'm also going to try emailing people directly. It certainly helps when other people talk about TPR. I notice a big spike in renewals when Ovid talked about his upcoming Logic Programming article.

There's another trick to getting renewals: special promotional subscriptions. Magazines give you a special rate for a limited time then hit you up for the full price as quickly as possible. It's all part of the ad-selling game. Advertisers want to know how many "qualified subscribers" you have, which means how many people actually want your magazine enough to pay for it. Getting people to be full subscribers raises those numbers.

TPR is a bit different because I don't aim to make it an audited magazine, meaning that no one is going to come in and put a stamp on our books saying they verified our subscriber base. I'm not in it to sell advertising (let's see if I'm saying the same thing in three years). So far I've only taken advertising from people the Perl community already loves and trusts. Most of that is just filling up the other full color pages that come along with the cover.

TPR got a mention in Northeastern's tiny article about Ian Langworth.

Curiously, they dropped every name they could find, except mine. Oh well. :)

Review PrimalScript IDE

at 21:27 PM, 07/25/2005

Anyone want to review PrimalScript, Sapien's IDE? It says it has Perl support.

Matching up records

at 05:19 AM, 07/24/2005

I now have a big stack of emails confirming renewal transactions, and I need to shove those transactions into my local database so people keep getting their magazines.

Most of them are pretty easy because they are keyed on their unique ID in the database. A lot of people skipped the link I sent them and went to the subscription page directly, so I have to match up those with what I see in te database.

So far it works like this and handles 95% of the records:


  1. Get all the matches by name

    1. With one match, we're probably done
    2. With no match, try different parts of the name (last, first)
    3. With multiple matches, we need to match something else too.

  2. From the name matches, compare email addresses. If they match, we're done.
  3. At this point we probably have several candidate matches.
  4. Look at parts of the address (country, city, address). If the first two are matches, we're pretty sure we have a match.
  5. Look at the email if we might have a match and compare the user portion to the new email and the stored email. Do the same for the host portion. This is just to raise the confidence level a bit.


This leaves about 5% that I need to check by hand. It's the first time I've had to go through with this so I'm being very cautious. Thank god for test suites.

It's renewal time

at 23:58 PM, 07/16/2005

Now that we've made it through the first year, it's time to get people to renew. I've sent out renewal emails, but I'm guessing 10% of them will never make it into an inbox. Spam has just about ruined email for anything.

If you've already received four issues, it's time for you to renew, and you can use subscription form and subscribe again.

Either way, I have an interesting task ahead. Since I don't store any personal information on the Pair.com servers that power our website, and we never store credit card information or do recurring billing, I get to match up the renewals with existing subscribers. It's easy to know which transactions are renewals, and each email I've sent has a link with a query string that I can link back to a subscriber. However, people might not follow the link but go directly to the webiste, or all sorts of other things that don't let me see that code. We'll see how that goes.

In related matters, I was rewriting the code bits that parse the email I get so I can shove all that stuff into the database. I was doing fancy things with ergexen, then Template::Extract, and some other things, and although it was a lot of fun, it was a big waste of time. Since I really just wanted to suck it into another program, why not send it as a ready-to-use data structure? It's easy enough to freeze things and unthaw them later. I still see my nicely formatted template, but at the end I include the Perl ready data. Besides the trivial parsing to isolate that, I'm ready to important things into the database. Things can be too simple to be seen sometimes.

Anyone want to review any of these books? Let brian know soon. The article deadline is at the end of the month.


  • Pro Perl (Apress)
  • Perl Testing: A Developer's Notebook (O'Reilly)
  • Learning Perl, 4th edition (O'Reilly)

I finally meet Eric Maki

at 08:44 AM, 06/29/2005

I finally got to meet Eric Maki, TPRs designer, in person. We went through the current issue and looked at a lot of the design things we might want to change, and we also have an idea for an upcoming project I'll have details later.

If you haven't seen the latest cover (Summer 2005), get yourself an issue. It's so nice that we're going to make posters of it.

I'll be on Perlcast soon

at 08:42 AM, 06/29/2005

After a long day at YAPC where I taught a 4-day Learning Perl course in a single day, and then a 5 hour boat dinner cruise, Josh McAdams of Perlcast interviewed me about The Perl Review. I'm not sure when he'll publish it.

The cool thing is that Josh is moving to Chicago. I might be able to get on Perlcast a bit more often. :)

The Best Software Writing reviewer

at 21:14 PM, 06/20/2005

I got an advnace copy of The Best Software Writing from Apress. It's a collection of writings compiled by Joel Spolsky. Anyone want to take a whack at reviewing this for the next issue?

In the mail today

at 11:13 AM, 06/15/2005

The Perl Review Summer 2005 issues should be in the mail today. The printer was screwing around again. They've been a disaster. You know the sort of people: they can't just do something without sending four emails back and forth, each with some new reason why things can't happen right away.

Ugh. My apologies everyone.

Reprinting old articles

at 01:28 AM, 12/01/2004

I'm thinking about making a special edition print issue with the best articles from the first two years. Good or bad idea? How much would you pay for that sort of thing? Which articles would you want?

Book Reviews for the next issue

at 19:17 PM, 10/25/2004

The next issue is scheduled for December 1, and I'm looking for book reviews. Any recent book might work, but I'm specifically interested in reviews of:



Reviews should be 200 to 300 words and should reflect your opinion. Say explicitly who should buy this book and why they should, or that nobody should buy the book. No matter what you say, be nice.

First post

at 19:49 PM, 10/20/2004

This community is for public discussion of The Perl Review. You may also like to check out [info]perl_review. Post away, although you need to join first.

Perl Reference

  • sorting a hash by keys - the $a and $b need to stay there, don't replace them they are for the sorting function.
    @sortedKeys = sort { $a <=> $b } keys %hashName # numerical sort
    @sortedKeys = sort { $a cmp $b } keys %hashName # ASCII-betical sort
    @sortedKeys = sort { lc($a) cmp lc($b) } keys %hashName # alphabetical sort

  • sorting a hash by values - the $a and $b need to stay there, don't replace them they are for the sorting function.
    @sortedKeys = sort { $hashName{$a} <=> $hashName{$b} } keys %hashName # numerical sort
    @sortedKeys = sort { $hashName{$a} cmp $hashName{$b} } keys %hashName # ASCII-betical sort
    @sortedKeys = sort { lc($hashName{$a}) cmp lc($hashName{$b}) } keys %hashName # alphabetical sort

2008 scandalz.net
If you have to hate, hate gently.
CountryUS
IP Address38.107.191.95
User AgentCCBot/1.0 (+http://www.commoncrawl.org/bot.html)