----- Forwarded message from QA Team Robot <qa@altlinux> -----
Package: perl-LWP-Parallel-2.57-alt1.1 Packager: Alexey Tourbin <at@altlinux> Build Statistics: 3 time(s) (last time: Thu Apr 15 2004) by Alexey Tourbin <at@altlinux> 1 time(s) (last time: Sat Feb 19 2005) by ALT QA Team Robot <qa-robot@altlinux> Status: i586 rebuild failed. Please investigate.
+ /usr/bin/make test 'CP=/bin/cp -p' make: Entering directory `/usr/src/RPM/BUILD/ParallelUserAgent-2.57' Manifying blib/man3/LWP::Parallel::Protocol::http.3pm Manifying blib/man3/LWP::Parallel::UserAgent.3pm Manifying blib/man3/LWP::RobotPUA.3pm Manifying blib/man3/LWP::Parallel::Protocol::ftp.3pm Manifying blib/man3/Bundle::ParallelUA.3pm Manifying blib/man3/LWP::ParallelUA.3pm Manifying blib/man3/LWP::Parallel::RobotUA.3pm Manifying blib/man3/LWP::Parallel::Protocol.3pm /usr/bin/perl t/TEST local/compatibility....Can't locate object method "parse_head" via package "LWP::Parallel::Protocol::http" at ../blib/lib/LWP/Parallel/UserAgent.pm line 1497, <DAEMON> line 1. Missing base argument at local/compatibility.t line 57 dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 1-20 Failed 20/20 tests, 0.00% okay local/file.............Can't locate object method "parse_head" via package "LWP::Parallel::Protocol::file" at ../blib/lib/LWP/Parallel/UserAgent.pm line 1497. dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 2-4 Failed 3/4 tests, 25.00% okay local/http.............Can't locate object method "parse_head" via package "LWP::Parallel::Protocol::http" at ../blib/lib/LWP/Parallel/UserAgent.pm line 1497, <DAEMON> line 1. Missing base argument at local/http.t line 85 dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 2-15 Failed 14/15 tests, 6.67% okay local/timeouts.........Can't locate object method "parse_head" via package "LWP::Parallel::Protocol::http" at ../blib/lib/LWP/Parallel/UserAgent.pm line 1497, <DAEMON> line 1. dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 2-25 Failed 24/25 tests, 4.00% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- local/compatibility.t 255 65280 20 40 1-20 local/file.t 255 65280 4 6 2-4 local/http.t 255 65280 15 28 2-15 local/timeouts.t 255 65280 25 48 2-25 Failed 4/4 test scripts. 61/64 subtests failed. Files=4, Tests=64, 30 wallclock secs ( 0.76 cusr + 0.10 csys = 0.86 CPU) Failed 4/4 test programs. 61/64 subtests failed. make: Leaving directory `/usr/src/RPM/BUILD/ParallelUserAgent-2.57' make: *** [test] Error 255
RPM build errors: error: Bad exit status from /usr/src/tmp/rpm-tmp.48552 (%build) Bad exit status from /usr/src/tmp/rpm-tmp.48552 (%build) Command exited with non-zero status 1 1.88user 0.28system 0:32.16elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+51981minor)pagefaults 0swaps hsh-rebuild: rebuild of `perl-LWP-Parallel-2.57-alt1.1.src.rpm' failed. Command exited with non-zero status 1
Unfortunately 5.815 broke download-to-file on Windows. I've now uploaded 5.816 to CPAN. This release only adds the binmode() statement that was missing.
On Thu, Sep 25, 2008 at 11:35:40AM +0200, Gisle Aas wrote: > > Do the url-encoded post parameters have to be of > > a given character encoding or is that just an agreement between the > > sender and receiver? > > There certainly has to be agreement between the sender and the > receiver. I thought the normal behaviour was to encode using the same > encoding as the document the form was embedded in uses.
Yes, that's why I use accept-charset=utf8 on my forms. BTW, I looked at a Firefox post with Wireshark and there's no charset added to the urlencoded content type. Seems like I've seen example of adding a charset to that content type.
In absence of a charset in the post I guess the server just has to assume it's encoded as requested (with accept-charset). That's what I do.
Now, in the case that trigged this question there is no form -- just documentation that says "post to this url", and the url has ?encoding=utf8 if the content is utf8.
> > If that's the case then it would seem like query_param should die if > > it receives any strings with the utf8 flag on. You can't encode_utf8 > > or utf8::downgrade because we don't know what (octet) encoding that > > the sender and receiver agreed on. > > I basically agree with that view. It can still be convenient to have > it assume UTF-8 encoding in this case, and there is the potential that > introducing this strictness breaks code.
With something as widely used as URI I don't think you can make that change. But a warning would probably be safe. Maybe a package variable could be used to enable exceptions.
> $u->query_form(foo => "bål"); > $u->query_form(foo => "bål", bar => "\N{WATCH}"); > > which prints: > > http://www.example.com?foo=b%E5l > http://www.example.com?foo=b%C3%A5l&bar=%E2%8C%9A > > Here the encoding of the first parameter depends on the presence of > the second parameter which is clearly not a good thing.
Right, the 8859-1 encoded bål was upgraded to a utf8 character string when combined with the ut8 character string. Might been different if your script was encoded in utf8 and had the use utf8 pragma.
I kind of wish the uf8 flag in Perl was called "character" instead. It's not utf8 (ok, it is), but it really should be though of as an abstract representation of characters without any encoding.
When the utf8 flag is set then the string is a Perl character string and to use it external of Perl (i.e. sending to another server) it really needs to be converted to octets. And most of the time it's up to the user to decide what encoding to use and not something the module can guess.
-- Bill Moseley moseley@hank.org Sent from my iMutt
The main change this time is the introduction of handlers to drive the processing of requests in LWP::UserAgent. You can also register your own handlers for modifying and processing requests or responses on their way, which I think is a much more flexible approach that trying to subclass LWP::UserAgent to customize it. If we have had these early on then the LWP::UserAgent API could have been so much simpler as the effect of most current attributes can easily be set up with trivial handlers. Also thanks to contributions by Bron Gondwana LWP's Basic/Digest authenticate modules now registers handlers which allow them to automatically fill in the Authorization headers without first taking the roundtrip of a 401 response when LWP knows the credentials for a given realm.
This code shows some examples of what you can use custom handlers for:
#!perl
use LWP::UserAgent; $ua = LWP::UserAgent->new;
# example of handler to add custom headers $ua->add_handler("request_prepare", sub { my $req = shift; $req->init_header("Accept-Language" => ["no", "en-US"]); });
# example of handler to rewrite the method for certain requests $ua->add_handler("request_prepare", sub { my $req = shift; $req->method("TRACE"); }, m_scheme => "http", m_method => "GET", m_domain => "example.com", );
# example of handler to monitor the requests that are sent $ua->add_handler("request_send", sub { my $req = shift; print $req->as_string; return; });
# example of handler that only pass HTTP requests through $ua->add_handler("request_send", sub { my $req = shift; return if $req->uri->scheme =~ /^http/; return HTTP::Response->new(403, undef, ["Server" => $ua->agent, "Content-Type" => "text/plain"], "It's our brand new policy to restrict access to HTTP!\n" ); });
# example of handler that counts number how much data we receive my $bytes_received = 0; $ua->add_handler("response_data", sub { $bytes_received += length($_[3]); });
On Thu, Sep 25, 2008 at 5:57 AM, Bill Moseley <moseley@hank.org> wrote: > On Sun, Sep 21, 2008 at 08:36:32AM +0200, Gisle Aas wrote: >> The issue with dropped chars has been fixed so I don't worry about >> that. Just upgrade the URI module. >> >> The remaining issue is if $url->query_form should accept Unicode data >> and automatically UTF-8 encode it as it does now. When I accepted >> that patch I though it would be harmless as this provide a convenience >> for some at the same time as it does not change anything for users >> that properly encode their data before passing it to this API. What's >> problematic is that this strengthens the idea that the UTF-8 flag has >> semantic meaning at the Perl level. Strings with chars in the range >> 128-255 behave differently depending on the internal representation. >> I'm not happy about that. It's certainly not my idea of a sane >> Unicode model. >> >> To me that leaves 2 options; either make the URI API strict and only >> accept args that are bytes (strings that can be utf8::downgraded) or >> just live with the ugliness of inconsistent Unicode model and try to >> document the issues better over time. I'm leaning towards the later. > > Sorry, kind of got stuck behind work here. > > So, in my situation I need to post some utf8 characters. The service > I'm using requires an ?encoding=utf8 query parameter to say what > encoding the text is encoded in. The post doesn't include > a charset: > > Content-Type: application/x-www-form-urlencoded > > So it seems the server needs to be explicitly told. > > > The problem I had was if I passed in a character string (utf8 flag on) > then the url-encoding process dropped chars. You say that has been > fixed. I fixed on my side by simply calling encode_utf8 to convert my > character string into octets. Then all octets were url-encoded and > passed to the server and all works fine.
Yes and that will always continue to work. If you encode the strings yourself things behave consistently and you select what encoding to use.
If you mix encoded (byte) strings and Unicode strings bad things happens. If you only use Unicode strings (and at least one of them has the utf8-flag set or none of them has chars above 127) you should get UTF-8 encoded output.
> Now, here's my question. Could I pass in any byte (octet) string and > have it url-encoded?
Yes. URI->query_form just encodes the bytes asis.
> Do the url-encoded post parameters have to be of > a given character encoding or is that just an agreement between the > sender and receiver?
There certainly has to be agreement between the sender and the receiver. I thought the normal behaviour was to encode using the same encoding as the document the form was embedded in uses.
> That is, can I encode my character string into any character > encoding and send it url-encoded? Then as long as the server > receiving the post knows how to decoded (using same encoding I used) > then it would be fine?
Right.
> If that's the case then it would seem like query_param should die if > it receives any strings with the utf8 flag on. You can't encode_utf8 > or utf8::downgrade because we don't know what (octet) encoding that > the sender and receiver agreed on.
I basically agree with that view. It can still be convenient to have it assume UTF-8 encoding in this case, and there is the potential that introducing this strictness breaks code.
It could also be argued that it might be helpful to break such code because it has the potential of already being broken. Consider this example:
On Sun, Sep 21, 2008 at 08:36:32AM +0200, Gisle Aas wrote: > The issue with dropped chars has been fixed so I don't worry about > that. Just upgrade the URI module. > > The remaining issue is if $url->query_form should accept Unicode data > and automatically UTF-8 encode it as it does now. When I accepted > that patch I though it would be harmless as this provide a convenience > for some at the same time as it does not change anything for users > that properly encode their data before passing it to this API. What's > problematic is that this strengthens the idea that the UTF-8 flag has > semantic meaning at the Perl level. Strings with chars in the range > 128-255 behave differently depending on the internal representation. > I'm not happy about that. It's certainly not my idea of a sane > Unicode model. > > To me that leaves 2 options; either make the URI API strict and only > accept args that are bytes (strings that can be utf8::downgraded) or > just live with the ugliness of inconsistent Unicode model and try to > document the issues better over time. I'm leaning towards the later.
Sorry, kind of got stuck behind work here.
So, in my situation I need to post some utf8 characters. The service I'm using requires an ?encoding=utf8 query parameter to say what encoding the text is encoded in. The post doesn't include a charset:
Content-Type: application/x-www-form-urlencoded
So it seems the server needs to be explicitly told.
The problem I had was if I passed in a character string (utf8 flag on) then the url-encoding process dropped chars. You say that has been fixed. I fixed on my side by simply calling encode_utf8 to convert my character string into octets. Then all octets were url-encoded and passed to the server and all works fine.
Now, here's my question. Could I pass in any byte (octet) string and have it url-encoded? Do the url-encoded post parameters have to be of a given character encoding or is that just an agreement between the sender and receiver?
That is, can I encode my character string into any character encoding and send it url-encoded? Then as long as the server receiving the post knows how to decoded (using same encoding I used) then it would be fine?
If that's the case then it would seem like query_param should die if it receives any strings with the utf8 flag on. You can't encode_utf8 or utf8::downgrade because we don't know what (octet) encoding that the sender and receiver agreed on.
-- Bill Moseley moseley@hank.org Sent from my iMutt
The issue with dropped chars has been fixed so I don't worry about that. Just upgrade the URI module.
The remaining issue is if $url->query_form should accept Unicode data and automatically UTF-8 encode it as it does now. When I accepted that patch I though it would be harmless as this provide a convenience for some at the same time as it does not change anything for users that properly encode their data before passing it to this API. What's problematic is that this strengthens the idea that the UTF-8 flag has semantic meaning at the Perl level. Strings with chars in the range 128-255 behave differently depending on the internal representation. I'm not happy about that. It's certainly not my idea of a sane Unicode model.
To me that leaves 2 options; either make the URI API strict and only accept args that are bytes (strings that can be utf8::downgraded) or just live with the ugliness of inconsistent Unicode model and try to document the issues better over time. I'm leaning towards the later.
On Fri, Sep 19, 2008 at 03:54:13PM +0200, Gisle Aas wrote: > This is probably not be a perl issue as I did not have the same > version of URI and libwww-perl installed where I tested. What version > of URI and libwww-perl did you use? I now think that you'll see the > first behaviour with URI <= 1.35 and the later if URI is more recent.
I have version 1.35.
-- Bill Moseley moseley@hank.org Sent from my iMutt
On Fri, Sep 19, 2008 at 02:14:25PM +0200, Gisle Aas wrote: > I wonder if this is a bug in perl itself. With perl-5.8.8 I get: > > count=123&works=%E2%98%BA&does_not_work=&foo=bar > > as you showed, but with perl-5.8.9(tobe) and perl-5.10.0 I get: > > count=123&works=%C3%A2%C2%98%C2%BA&does_not_work=%E2%98%BA&foo=bar > > so now it's the 'works' case that does_not_work but in a different way :(
That's bothersome.
Would you agree that the url encoding must be done on octets? If so, then I think it would be correct to issue a warning if the utf8 flag is on.
I think doing it in query_form would be the best place as it tells the user what specific item had the utf8 flag on.
while (my($key,$vals) = splice(@new, 0, 2)) { $key = '' unless defined $key;
warn "key [$key] has utf8 flag set and may not url-encode" if utf8::is_utf8( $key );
$key =~ s/([;\/?:@&=+,\$\[\]%])/$URI::Escape::escapes{$1}/g; $key =~ s/ /+/g; $vals = [ref($vals) eq "ARRAY" ? @$vals : $vals]; for my $val (@$vals) { $val = '' unless defined $val;
warn "value [$val] for key [$key] has utf8 flag set and may not url-encode" if utf8::is_utf8( $val );
Actually, I guess I'd argue to die, but that might break existing code. After all, if trying to url-encode characters might result in dropped or altered data that's pretty serious.
Thanks,
-- Bill Moseley moseley@hank.org Sent from my iMutt
This is probably not be a perl issue as I did not have the same version of URI and libwww-perl installed where I tested. What version of URI and libwww-perl did you use? I now think that you'll see the first behaviour with URI <= 1.35 and the later if URI is more recent.
--Gisle
On Fri, Sep 19, 2008 at 2:14 PM, Gisle Aas <gisle@aas.no> wrote: > I wonder if this is a bug in perl itself. With perl-5.8.8 I get: > > count=123&works=%E2%98%BA&does_not_work=&foo=bar > > as you showed, but with perl-5.8.9(tobe) and perl-5.10.0 I get: > > count=123&works=%C3%A2%C2%98%C2%BA&does_not_work=%E2%98%BA&foo=bar > > so now it's the 'works' case that does_not_work but in a different way :( > > --Gisle > > > On Fri, Sep 19, 2008 at 7:54 AM, Bill Moseley <moseley@hank.org> wrote: >> I have a form that posts to a service, and noticed not all >> parameters were being posted. >> >> I realized my mistake of not encoding to utf8, but I'm curious why >> this did not generate any warnings. >> >> I can imagine others getting caught by this, so a warning would be very >> helpful. >> >> Not really sure where it should be checked -- in query_form when >> processing the individual parameters, I suspect, but the damage seems >> to happen when $uri->query is called: >> >> $q =~ s/([^$URI::uric])/$URI::Escape::escapes{$1}/go; >> >> Would it not be better to issue a waring? >> >> >> use HTTP::Request::Common; >> use strict; >> use warnings; >> use Data::Dumper; >> use Encode; >> >> my $content = { >> foo => 'bar', >> count => 123, >> works => encode_utf8("\x{263A}"), >> does_not_work => "\x{263A}", >> }; >> >> my $x = HTTP::Request::Common::POST( >> 'http://localhost.com', >> $content, >> ); >> print Dumper $x; >
so now it's the 'works' case that does_not_work but in a different way :(
--Gisle
On Fri, Sep 19, 2008 at 7:54 AM, Bill Moseley <moseley@hank.org> wrote: > I have a form that posts to a service, and noticed not all > parameters were being posted. > > I realized my mistake of not encoding to utf8, but I'm curious why > this did not generate any warnings. > > I can imagine others getting caught by this, so a warning would be very > helpful. > > Not really sure where it should be checked -- in query_form when > processing the individual parameters, I suspect, but the damage seems > to happen when $uri->query is called: > > $q =~ s/([^$URI::uric])/$URI::Escape::escapes{$1}/go; > > Would it not be better to issue a waring? > > > use HTTP::Request::Common; > use strict; > use warnings; > use Data::Dumper; > use Encode; > > my $content = { > foo => 'bar', > count => 123, > works => encode_utf8("\x{263A}"), > does_not_work => "\x{263A}", > }; > > my $x = HTTP::Request::Common::POST( > 'http://localhost.com', > $content, > ); > print Dumper $x;
I have a form that posts to a service, and noticed not all parameters were being posted.
I realized my mistake of not encoding to utf8, but I'm curious why this did not generate any warnings.
I can imagine others getting caught by this, so a warning would be very helpful.
Not really sure where it should be checked -- in query_form when processing the individual parameters, I suspect, but the damage seems to happen when $uri->query is called:
On Thu, Sep 18, 2008 at 08:22:46AM -0400, Kathalkar, Sanket wrote: > I am going to use LWP module for accessing urls which are SSL > enabled. Is Crypt::SSLeay required for this?
The README.SSL distributed with LWP answers this: http://search.cpan.org/src/GAAS/libwww-perl-5.814/README.SSL
> Do I need to create a certificate to access a remote server?
I've never needed to: most SSL enabled Web servers don't need client certificates. But if you're running against a server that needs an SSL certificate to authenticate you, you will need to provide that certificate somehow.
I am going to use LWP module for accessing urls which are SSL enabled. Is Crypt::SSLeay required for this? Give me the steps to impement this. Do I need to create a certificate to access a remote server?
* stefano tacconi wrote: >I'm writing a simple script to download some web pages on the net. >Using LWP it's works fine, but how can I get html page with strange >characher?
You are probably looking for HTML::Encoding, the script in the synopsis shows how to decode the content; HTTP::Response::Encoding seems to be a rather crude module that is unaware of HTML semantics. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
On Sun, Sep 7, 2008 at 1:49 PM, Michael Greb <mgreb@linode.com> wrote: > On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: >> >> True; and in this case we need to define what happens when fields are >> modified with 'push', 'set' or 'init' and 'remove' as that's the API >> that modify stuff. Let me suggest the following definition of the >> behaviour: >> >> - 'push' always append the field at the end of all headers. multiple >> occurrences of a field name do not have to be consecutive. >> >> - 'init' either does nothing or it works like 'push'. >> >> - 'remove' will always remove all concurrences of a field. >> >> - 'set' will work like 'push' if no other occurrence of the field exists. >> >> - 'set' will update the first occurrence if the field exists (and >> remove all other occurrences). if multiple field values is provided >> with 'set' they are basically all injected at the location of the >> first existing value. > > > On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong: >> >> I think it makes sense to be able to enable them separately. >> Suggested interface: >> >> $h->scan(\&cb, original_order => 1, original_case => 1); >> $h->as_string(eol => "\n", original_order => 1, original_case => 1);' > > The attached patch uses the interface above and works towards the behavior > outlined in the first message. Due to the headers being stored as a hash, > pushing does not currently preserve previous values, second and subsequent > pushes of the same header will overwrite the previous value. Supporting > this would require a change in how the headers are stored within the module. > Your thoughts?
I think it's better to just use your original approach and just keep the representation like used to be with the addition of an array that records the original field names and their order. This should lead to a smaller patch as the only thing that need to change is the code that sets headers and the scan method. I also like header lockups to be efficient and the representation compact.
> Server: Fool/1.0 > content-encoding: gzip > Content-Type: text/plain; charset="UTF-8" > Content-Encoding: base64 > Date: Fri Sep 5 10:24:37 CEST 2008 > > Would be stored as (assuming push_header):
The invariant that needs to hold is that there is the same number of elements in {"::original_fields"} as there are values for all the others keys.
Pushing a value is trivial; only change from what we have now is appending the original field name to {"::original_fields"}.
The only state modification operation that becomes more complex is setting of a value header value. It has to:
- update the values in the hash as before - locate the first occurence of the field name in {"::original_fields"} => $idx - remove all other occurrences of the field name - splice(@{"::original_fields"}, $idx, 1, ($orig_field_name) x $numbers_of_values_set);
When 'scan' wants to iterate over the original headers it would have to keep an index into the values array for each field that repeat.
An more compact representation could be to store {"::original_fields"} as a ":"-separated string; but we can think about that optimization later.
On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: > On Fri, Sep 5, 2008 at 7:49 PM, Michael Greb <mgreb@linode.com> wrote: > As I said above I would solve the problem by not changing > 'header_field_names' at all. Do you feel the scan interface isn't > good enough for your use case?
This makes a lot of sense and scan will suit us just fine.
>> Writing code is easy, it's deciding how that code should behave >> that is the >> hard part. > > True; and in this case we need to define what happens when fields are > modified with 'push', 'set' or 'init' and 'remove' as that's the API > that modify stuff. Let me suggest the following definition of the > behaviour: > > - 'push' always append the field at the end of all headers. multiple > occurrences of a field name do not have to be consecutive. > > - 'init' either does nothing or it works like 'push'. > > - 'remove' will always remove all concurrences of a field. > > - 'set' will work like 'push' if no other occurrence of the field > exists. > > - 'set' will update the first occurrence if the field exists (and > remove all other occurrences). if multiple field values is provided > with 'set' they are basically all injected at the location of the > first existing value. > > You want to try to implement this?
Yes. Have a good chance of losing net connectivity at home this weekend so this makes for a perfect no Internets required weekend project ;)
On Fri, Sep 5, 2008 at 7:49 PM, Michael Greb <mgreb@linode.com> wrote: > On Sep 5, 2008, at 4:29 AM, Gisle Aas wrote: >> >> Hi Michael, >> >> This seems like a very useful addition to libwww-perl. I have been >> wanting a mode where $response->as_string would show responses exactly >> as they where received without adding, or reordering of the headers >> or even fix up the casing for the header field names. A patch like >> yours should make this much easier. >> >> Your patch does not address the preserving-of-case for header filed >> names. Is that not required for your signing server? > > We join the values of the signed headers without the name of the header so > case doesn't matter for us. That said, it certainly makes sense to store > the headers in their original case in _wire_order rather than the normalized > version. Should the header_field_names and the pass method both then return > the headers in the original case when dont_sort is passed?
I think I would prefer to leave 'header_field_names' alone and only support original field order and field name casing for the 'scan' and 'as_string' methods. This since 'header_field_names' is documented to not repeat field names, while the others do.
>> It also seems your approach makes it hard to deal correctly with >> repeated headers mixed in with others; for instance something like >> this ugly response: >> >> 200 OK >> Server: Fool/1.0 >> content-encoding : >> gzip >> Content-Type: text/plain; charset="UTF-8" >> Content-Encoding: base64 >> Date: Fri Sep 5 10:24:37 CEST 2008 >> >> H4sICETrwEgAA3h4eADLSM3JyVcozy/KSVHkAgC0r9cBDQAAAA== >> >> Your thoughts? > > > I'm not sure exactly what the right way to handle this would be. > header_field_names is speced in the docs as returning only the distinct > header field names. Perhaps rather than an optional dont_sort argument this > should be a new method, something like 'wire_header_fields' that returns all > headers in the original case and order including duplicates? This also > relates to the as_string method and your desire to have a mode that returns > things in thier original form.
As I said above I would solve the problem by not changing 'header_field_names' at all. Do you feel the scan interface isn't good enough for your use case?
> Writing code is easy, it's deciding how that code should behave that is the > hard part.
True; and in this case we need to define what happens when fields are modified with 'push', 'set' or 'init' and 'remove' as that's the API that modify stuff. Let me suggest the following definition of the behaviour:
- 'push' always append the field at the end of all headers. multiple occurrences of a field name do not have to be consecutive.
- 'init' either does nothing or it works like 'push'.
- 'remove' will always remove all concurrences of a field.
- 'set' will work like 'push' if no other occurrence of the field exists.
- 'set' will update the first occurrence if the field exists (and remove all other occurrences). if multiple field values is provided with 'set' they are basically all injected at the location of the first existing value.
On Sep 5, 2008, at 4:29 AM, Gisle Aas wrote: > Hi Michael, > > This seems like a very useful addition to libwww-perl. I have been > wanting a mode where $response->as_string would show responses exactly > as they where received without adding, or reordering of the headers > or even fix up the casing for the header field names. A patch like > yours should make this much easier. > > Your patch does not address the preserving-of-case for header filed > names. Is that not required for your signing server?
We join the values of the signed headers without the name of the header so case doesn't matter for us. That said, it certainly makes sense to store the headers in their original case in _wire_order rather than the normalized version. Should the header_field_names and the pass method both then return the headers in the original case when dont_sort is passed?
> It also seems your approach makes it hard to deal correctly with > repeated headers mixed in with others; for instance something like > this ugly response: > > 200 OK > Server: Fool/1.0 > content-encoding : > gzip > Content-Type: text/plain; charset="UTF-8" > Content-Encoding: base64 > Date: Fri Sep 5 10:24:37 CEST 2008 > > H4sICETrwEgAA3h4eADLSM3JyVcozy/KSVHkAgC0r9cBDQAAAA== > > Your thoughts?
I'm not sure exactly what the right way to handle this would be. header_field_names is speced in the docs as returning only the distinct header field names. Perhaps rather than an optional dont_sort argument this should be a new method, something like 'wire_header_fields' that returns all headers in the original case and order including duplicates? This also relates to the as_string method and your desire to have a mode that returns things in thier original form.
Writing code is easy, it's deciding how that code should behave that is the hard part.
Mike
- -- Michael Greb Linode.com 609-593-7103 ext 1205
This seems like a very useful addition to libwww-perl. I have been wanting a mode where $response->as_string would show responses exactly as they where received without adding, or reordering of the headers or even fix up the casing for the header field names. A patch like yours should make this much easier.
Your patch does not address the preserving-of-case for header filed names. Is that not required for your signing server?
It also seems your approach makes it hard to deal correctly with repeated headers mixed in with others; for instance something like this ugly response:
200 OK Server: Fool/1.0 content-encoding : gzip Content-Type: text/plain; charset="UTF-8" Content-Encoding: base64 Date: Fri Sep 5 10:24:37 CEST 2008
On Thu, Sep 4, 2008 at 9:35 PM, Michael Greb <mgreb@linode.com> wrote: > Greetings, > > We are currently using HTTP::Daemon to prototype a project and have a need > to access headers in the order they were sent over the network. Our > particular use case is cryptographically signing a subset of the headers and > sending this signature as an additional header. > > A specified set of headers are to be included in the signature if present in > the request. We join the content of these headers (with "\n") then > calculate the expected signature and compare it to the value submitted by > the client. In order to get the same signature, we must join the header > content in the same order as the client. If we only needed to support perl > clients using LWP::UserAgent, this wouldn't be an issue as HTTP::Daemon and > LWP::UserAgent both use HTTP::Headers and the order the headers will be > presented to the consuming script is predictable. Unfortunately, we must > support multiple languages. > > The HTTP client is allowed to join the headers in preparation for signing in > any order it wishes so long as it then sends the headers in the same order > over the network. The attached patch stores the order headers are added to > the HTTP::Headers object in an arrayref ($self->{_wire_order}). The > header_field_names and scan methods are extended to take an optional value > that if present and true cause the headers to be returned/visited based on > the order of elements in $self->{_wire_order} rather than the existing 'best > practices' order. The next logical step would be similar extension to the > as_string method. > > This code has been tested and, thanks to great tests, I was able to catch > missing the clear method in my first go at the functionality. All tests > currently pass except for a few[1] that seem to be related to the new > run_handler method[2]. I'm a bit unsure that the push within the _header > method does the right thing in all cases (particularly adding an additional > value to an existing header and replacing an existing header with a new > value). > > This patch does include an update to the relevant docs but does not include > new tests. Should the functionality be deemed useful for inclusion in > libwww-perl I can go ahead and extend the as_string method and add some new > tests to match the new functionality.
> I'm not sure whether you're still interested in this, but I think > I've come across a bug in HTTP::Cookies. If you're not the right > person for me to handle this, please let me know who is.
I'm the right one; but it's usually best to send requests like this to the libwww mailing list. Cc:-ed.
> The problem is in add_cookie_header. If the cookie version is > nonzero and the cookie contents include a non-alpha (\W) character, > it escapes any quotes or slashes in the cookie value.
Why do you specify an nonzero version number without using the Set- Cookie2 header?
I'm thinking that the right fix for this might be to just force 'version=0' for any cookie set with 'Set-Cookie'. This patch achieve that:
> > The problem arises when the server has delivered a cookie value > that is ENCLOSED in quotes, i.e., > Set-Cookie: member="whatever"; version=1; Path=/ > > When it comes time for add_cookie_header to do its thing, it generates > Cookie: member="\"whatever\""; $Path="/" > Cookie2: $Version="1" > > I guess there are 2 bugs here: > 1) The biggest problem is with the quoting. I think I've fixed > this by inserting one line in Cookies.pm: > > # do we need to quote the value > if ($val =~ /\W/ && $version) { > $val =~ s/^"(.*)"$/$1/; ### RLS 9/3/08 > $val =~ s/([\\\"])/\\$1/g; > $val = qq("$val"); > } > > 2) The second problem is with the treatment of the Path and version > fields. They appear to be treated as if they were cookie values. > And yet they are transmitted with a prefix of "$". I REALLY don't > understand what's going on here, and I'm not inclined to mess with it.
Read RFC 2965 if you want to understand the deal with $Path and $Version.
On Mon, Sep 1, 2008 at 11:15 AM, Phil Archer <parcher@fosi.org> wrote: > Hi, > > I've used LWP in several apps in which the key bit of information I'm after > is the headers. I've therefore got used to the fact that if the returned > resource is HTML, one of the triggers for "OK, that's all the headers and > everything else must be content" is the presence of anything in the <head> > section of the document that LWP doesn't recognise. > > Take this, for example: > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html > xmlns:creativeCommons='http://backend.userland.com/creativeCommonsRssModule' > xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"> > > <creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/3.0/</creativeCommons:license> > > <head profile="http://gmpg.org/xfn/11"> > ... > > Perfectly valid XHTML - but... LWP doesn't recognise the <creativecommons... > tag and so stops parsing the headers. > > The User Agent package I'm using is version 2.31 > > So, some questions: > > 1. Which modules need updating so that LWP can recognise this kind of thing > as valid <head> content
It's the HTML::HeadParser module in the HTML-Parser dist.
> 2. Has anyone written such a module?
Not that I know about.
> > As a demonstration, [1] and [2] show the status line, headers_as_string and > content from two versions of the same document, the only difference between > the two being that in [2], the <creativecommons..> tag is commented out. You > can get this output from any URI using the form at [3]. > > Thanks for any help > > Phil. > > [1] > http://www.icra.org/cgi-bin/HTTP_Headers.cgi?url=http%3A%2F%2Fwww.icra.org%2Flabel%2FHTTP-Test%2Fspace.htm > [2] > http://www.icra.org/cgi-bin/HTTP_Headers.cgi?url=http%3A%2F%2Fwww.icra.org%2Flabel%2FHTTP-Test%2Fspace-mod.htm > [3] http://www.icra.org/label/HTTP-Test/
I've used LWP in several apps in which the key bit of information I'm after is the headers. I've therefore got used to the fact that if the returned resource is HTML, one of the triggers for "OK, that's all the headers and everything else must be content" is the presence of anything in the <head> section of the document that LWP doesn't recognise.
Take this, for example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns:creativeCommons='http://backend.userland.com/creativeCommonsRssModule' xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
Perfectly valid XHTML - but... LWP doesn't recognise the <creativecommons... tag and so stops parsing the headers.
The User Agent package I'm using is version 2.31
So, some questions:
1. Which modules need updating so that LWP can recognise this kind of thing as valid <head> content
2. Has anyone written such a module?
As a demonstration, [1] and [2] show the status line, headers_as_string and content from two versions of the same document, the only difference between the two being that in [2], the <creativecommons..> tag is commented out. You can get this output from any URI using the form at [3].
-- Phil Archer Chief Technical Officer, Family Online Safety Institute w. http://www.fosi.org/people/philarcher/
Register now for the annual Family Online Safety Institute Conference and Exhibition, December 11th, 2008, Washington, DC. See http://www.fosi.org/conference2008/
I have made some changes to libwww to support IPv6, but in doing so I seem to have lost IPv4 capability. I am not an expert in Perl and would appreciate if someone could look at my code and work out what is wrong. The version I am working on is simply to allow people to have IPv6 access until people are ready to add IPv6 support to the official build.
If you think you can help, please could you let me know.
when chasing an interoperatibility problem between a SOAP::Lite server and a Microsoft .NET client I dedected that there is obviously a bug in HTTP::Daemon.pm:
When a request contains the "Excpect: 100-continue" header, the code in HTTP::Daemon.pm reacts with sending
100 Continue<CRLF> (with 1 CRLF)
This causes the Microsoft client to hang because it expects obiously 2 CRLFs after the 100 Continue header.
I don't know the RPC but shouldn't be there 2 CRLFs (i.e. a line "100 Continue" followed by an empty line)?
I've just created a module that will help you with the task of automating ASP.NET sites. Please check out HTML::TreeBuilderX::ASP_NET and HTML::TreeBuilderX::ASP_NET::Roles::htmlElement and give some feed back.
Have seen this issue addressed on this board in the past, but can't figure out exactly what I need to do.
I'm using WWW::Mechanize and a webpage seems to be redirecting me but Mechanize doesn't seem to follow it. I've heard to "Add the header 'Accept: text/html'" but alas I don't know how to add the header, or what that means.
I suspect the solution is simple belief - but alas I'm not coming up with it. Any help would be much appreciated.
In order to build and install modules like HTML::Parser from source you basically need the same compiler environment that was used to build perl itself. The 'cl' program that you are missing is the Microsoft C compiler.
Which perl are you using? Can I suggest that you just try ActivePerl since it comes LWP and releated modules ready to go.
Regards, Gisle
On Fri, Jul 18, 2008 at 10:37 AM, jeyasimhan m <jai_dgl@yahoo.co.in> wrote: > Hi, > > When I try to install HTML::Parser as well as Bundle::LWP I'm getting this following error > > perl -MCPAN -e "install Bundle::LWP" > > C:\Perl\bin\perl.exe C:\Perl\lib\ExtUtils\xsubpp -typemap C:\Perl\lib\ExtUtils\typemap -typemap typemap Parser.xs > Parser.xsc && C:\Perl\bin\perl.exe -MExtUtils::Command -e mv Parser.xsc Parser.c > cl -c -nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX -MD -Zi -DNDEBUG -O1 -DVERSION=\"3.56\" -DXS_VERSION=\"3.56\" "-IC:\Perl\lib\CORE" -DMARKED_SECTION Parser.c > 'cl' is not recognized as an internal or external command, > operable program or batch file. > NMAKE : fatal error U1077: 'C:\WINDOWS\system32\cmd.exe' : return code '0x1' > > > Please help me to install these modules > > Thanks > Jey
On Sun, Jul 6, 2008 at 7:06 PM, Bill Moseley <moseley@hank.org> wrote: > On Sun, Jul 06, 2008 at 02:36:10PM +0200, Gisle Aas wrote: >> > $req->set_default_accept_encoding; >> >> I don't like defaults to be set at that level given that we already >> have a $ua->default_header() method, so I think it should be something >> like: >> >> $ua->default_header("Accept-Encoding", join(",", >> HTTP::Message::decodable())); > > I saw this yesterday, too, about in the absence of an Accept-Encoding > server "MAY" send any encoding. > > http://use.perl.org/~rhesa/journal/25952 > > That may be one argument for having a default, but in practice I'd > expect it very rare for a server to compress w/o an Accept-Encoding > header sent by the client.
I think it's correct to not have a default by default. The library can't really know what's acceptable without the app telling it. If the app for instance wants to mirror the file then the encoding does not matter. All it cares about is storing the bytes as sent from the server.
I've now implemented a HTTP::Message::decodable() function with this patch:
>> > I'm not clear if there's a need to also specify a quality for the >> > encodings in the Accept-Encoding header. >> >> I don't think we need to worry about this initially. > > And the RFC says qvalues are not permitted with x-gzip and z-compress.
Good.
> >> > I kind of wonder why $res->content is not decoded by default (and >> > provide $res->raw_content for those that need it). >> >> It's mostly because of history and compatibility with the original >> content() method. Both are useful in different contexts. I don't >> find the current situation bad. Since decoded_content() can be >> expensive and can fail I think the longer name makes it obvious what's >> going on how you should use it. > > Agreed, it's not something that could change. I was just lamenting > how often I see $res->content used in existing programs and modules. > > I don't see using $res->decoded_content as more expensive. If you > need decoded content (which is likely the typical use) then you have > to decode it -- no way around that.
Right. But there are apps that don't need this too. Also knowing that it might be expensive might give you the hint needed to understand that you might want to cache the result of decoded_content instead of calling it over and over for the same message. That is code like this might not be a good idea:
> > > I can only guess that the beginners are more likely to use > $res->content directly (as that's the example in the SYNOPSIS) and > they perhaps are on slower connections where compression would help > both the server and client. But, it's not breaking anything to not > use compression. > > Ignoring decoding (charset), on the other hand, is probably wrong in > most cases -- even though it's easy to ignore. > > > You have this in the SYNOPSIS of LWP::UserAgent: > > if ($response->is_success) { > print $response->content; # or whatever > } > > which is, perhaps accidentally, correct since you are printing > un-decoded (charset) content. But, I doubt most users are just using > LWP to print content out directly. > > How would you feel about providing new users with more guidance in the > SYNOPSIS? That is, use decoded_content in the synopsis for those of > us that often don't get past that section of the man page. > > if ($response->is_success) { > $content = $response->decoded_content; > }
I've changed this occurence now. I also noticed that other places already used decoded_content in similar examples so this is just an oversight because these examples predates the decoded_content method.
> > Now, I suspect that LWP::Simple really should be returning > decoded_content -- but again, I don't know how to to change that one > without breaking a large number of existing scripts.
I'm not sure either. Perhaps just document it better.
> > > I think I asked about this some time ago, but might be good for > HTTP::Message to have decoded_content wrap two methods for > un-compressing and the charset decoding. There might be a case where > we would want uncompressing but not decoding.
You get that by calling $mess->decoded_content(charset => "none")
I don't think there is a use-case for trying to decode the charset without undoing the Content-Encoding first.
> Hum, I'm not clear about this, but I wonder if the response content is XML > that will be passed to, say, XML::LibXML should it be passed decoded > or not.
You just have to read the documentation for XML::LibXML to find out what kind of input it expects. Does it want Unicode strings or bytes? LWP can provide both.
On Sun, Jul 06, 2008 at 02:36:10PM +0200, Gisle Aas wrote: > > $req->set_default_accept_encoding; > > I don't like defaults to be set at that level given that we already > have a $ua->default_header() method, so I think it should be something > like: > > $ua->default_header("Accept-Encoding", join(",", > HTTP::Message::decodable()));
I saw this yesterday, too, about in the absence of an Accept-Encoding server "MAY" send any encoding.
http://use.perl.org/~rhesa/journal/25952
That may be one argument for having a default, but in practice I'd expect it very rare for a server to compress w/o an Accept-Encoding header sent by the client.
> > I'm not clear if there's a need to also specify a quality for the > > encodings in the Accept-Encoding header. > > I don't think we need to worry about this initially.
And the RFC says qvalues are not permitted with x-gzip and z-compress.
> > I kind of wonder why $res->content is not decoded by default (and > > provide $res->raw_content for those that need it). > > It's mostly because of history and compatibility with the original > content() method. Both are useful in different contexts. I don't > find the current situation bad. Since decoded_content() can be > expensive and can fail I think the longer name makes it obvious what's > going on how you should use it.
Agreed, it's not something that could change. I was just lamenting how often I see $res->content used in existing programs and modules.
I don't see using $res->decoded_content as more expensive. If you need decoded content (which is likely the typical use) then you have to decode it -- no way around that.
I can only guess that the beginners are more likely to use $res->content directly (as that's the example in the SYNOPSIS) and they perhaps are on slower connections where compression would help both the server and client. But, it's not breaking anything to not use compression.
Ignoring decoding (charset), on the other hand, is probably wrong in most cases -- even though it's easy to ignore.
You have this in the SYNOPSIS of LWP::UserAgent:
if ($response->is_success) { print $response->content; # or whatever }
which is, perhaps accidentally, correct since you are printing un-decoded (charset) content. But, I doubt most users are just using LWP to print content out directly.
How would you feel about providing new users with more guidance in the SYNOPSIS? That is, use decoded_content in the synopsis for those of us that often don't get past that section of the man page.
if ($response->is_success) { $content = $response->decoded_content; }
Now, I suspect that LWP::Simple really should be returning decoded_content -- but again, I don't know how to to change that one without breaking a large number of existing scripts.
I think I asked about this some time ago, but might be good for HTTP::Message to have decoded_content wrap two methods for un-compressing and the charset decoding. There might be a case where we would want uncompressing but not decoding.
Hum, I'm not clear about this, but I wonder if the response content is XML that will be passed to, say, XML::LibXML should it be passed decoded or not.
-- Bill Moseley moseley@hank.org Sent from my iMutt
On Sat, Jul 5, 2008 at 6:39 PM, Bill Moseley <moseley@hank.org> wrote: > HTTP::Message has a decoded_content() method that will attempt > to uncompress based on the Content-Encoding header in the response. > > It's wrapped in an eval which will trap exceptions when trying to > require the modules used to uncompress the content. > > It would make sense that I would set Accept-Encoding based on if I > have those modules installed.
RIght.
> Since the list of modules (Compress::Zlib and Compress::Bzip2) is > internal to HTTP::Message, would it make sense to provide a method > that could set the Accept-Encoding based on what HTTP::Message uses? > Something like: > > $req->set_default_accept_encoding;
I don't like defaults to be set at that level given that we already have a $ua->default_header() method, so I think it should be something like:
> I'm not clear if there's a need to also specify a quality for the > encodings in the Accept-Encoding header.
I don't think we need to worry about this initially.
> This can't be the default as it would break existing users. > > I often notice code that uses $res->content instead of > $res->decoded_content. Most of the time it seems like users really > want the decoded content. > > I kind of wonder why $res->content is not decoded by default (and > provide $res->raw_content for those that need it).
It's mostly because of history and compatibility with the original content() method. Both are useful in different contexts. I don't find the current situation bad. Since decoded_content() can be expensive and can fail I think the longer name makes it obvious what's going on how you should use it.
HTTP::Message has a decoded_content() method that will attempt to uncompress based on the Content-Encoding header in the response.
It's wrapped in an eval which will trap exceptions when trying to require the modules used to uncompress the content.
It would make sense that I would set Accept-Encoding based on if I have those modules installed.
Since the list of modules (Compress::Zlib and Compress::Bzip2) is internal to HTTP::Message, would it make sense to provide a method that could set the Accept-Encoding based on what HTTP::Message uses? Something like:
$req->set_default_accept_encoding;
I'm not clear if there's a need to also specify a quality for the encodings in the Accept-Encoding header.
This can't be the default as it would break existing users.
I often notice code that uses $res->content instead of $res->decoded_content. Most of the time it seems like users really want the decoded content.
I kind of wonder why $res->content is not decoded by default (and provide $res->raw_content for those that need it).
-- Bill Moseley moseley@hank.org Sent from my iMutt