If you have mbstring.func_overload configured to alias mb_strlen for strlen (i.e. when the 2 bit is flipped), then strlen starts counting characters, not bytes. If you need to count the number of bytes, it's not obvious how you're supposed to do it.
This is how I did it:
In places where I really needed to know the number of bytes, I used a homebrewed function byte_count instead strlen. Here's the function definition for byte_count.
function byte_count($val) { $len = (function_exists('mb_strlen')) ? mb_strlen($val, 'latin1') : strlen($val); return $len; }
Perl is hokey about it too. The length is supposed to count the number of characters but if you want to force it to count bytes, you need to use the bytes pragma. From the manpage:
$x = chr(400); print "Length is ", length $x, "\n"; # "Length is 1" printf "Contents are %vd\n", $x; # "Contents are 400" { use bytes; print "Length is ", length $x, "\n"; # "Length is 2" printf "Contents are %vd\n", $x; # "Contents are 198.144" }
Java is not without it's pickiness but it as least it has byte and char as distinct primitives.
( Dec 18 2004, 12:50:23 AM PST ) PermalinkApparently, there's some Apache goodness available for this now. At least I think it sounds good! Ian Holsman has written mod_ip_count for Apache 2.0. It uses the APR portability layer and memcached for shared state (actually apr_memcache from Paul Querna). This would enable a whole server farm to keep track of request rates from and throttle specific IP addresses.
( Dec 15 2004, 04:24:13 PM PST ) PermalinkThis weekend Technorati's network and server infrastructure is going to move. In one big fell swoop. Well, hopefully nothing will fall.
The home page sez: "Movin' on up" cause Technorati is substituting the Jefferson's theme song for the old ops/facilities anthem, the Talking Heads' "Burning Down The House"
( Dec 14 2004, 12:47:43 AM PST ) PermalinkWorking on a recent Japanese localization project was an eye opening experience. It turns out the java.util.Properties expects ISO-8859-1 characters. I guess that's the downside of having a super-simple file format. I got the localized display boostrapped by using native2ascii to get the UTF-8 localization text rendered as escaped unicode. On a one-off basis, that's easy enough. But collaborative development always begs the tools question, how do folks typically manage this?
What about input encoding? If there's an HTML form on a page and the input has multibyte characters in the query string (or POST data), are characters escaped to ISO-8859-1? My recollection was that HTTP headers must be ISO-8859-1.... but looking at the docs for PHP's mbstring and the encoding_translation parameter, it looks like server-side handling of the request needs to account for other character set encodings. Do browsers honor charset specification as a form attribute, like
<form action=... method=... accept-charset="UTF-8">(looks like Struts supports this) or is it presumed that the browser always escapes unicode? Or perhaps they simply URL encode the characters so it's a non-issue? On the server side the must the request handling do this
request.setCharacterEncoding("UTF-8"); String raw = request.getParameter("foo"); String clean = new String(raw.getBytes("ISO-8859-1"), "UTF-8");or is it all supposed to transparently just work (obviating String cleansing) if request.setCharacterEncoding("UTF-8") is used? ...for all of the hand-waving in the docs for ResourceBundle, etc establishing a clear practice for input String handling in a webapp remains murky.
As far as sending responses, is it safe to always just send UTF-8 and include "charset=UTF-8" in the Content-type header? Is it standard practice to presume that the client will send a request header Accept-Charset (which indicates what an acceptable response is)? If they send it and UTF-8 isn't on the list, must the server go through a big String re-writing exercise to encode response to the browser's preference or is UTF-8 presumed to be implicitly acceptable at all times?
So many questions... I'm still digging for anwers.
( Dec 12 2004, 11:51:01 PM PST ) PermalinkYou can do this in tiles-defs.xml
<definition name=".dog" extends=".animal.layout"> <put name="body" value=".dog.display" /> <put name="head" value=".dog.head" /> </definition> <definition name=".cosmos.head" extends=".head"> <put name="titleKey" value="dog.title" /> </definition> <definition name=".dog.display" controllerUrl="/dog.do" path="/tile/dog.vm" />and so forth. Declaritive tile composition works just fine. But what about programmatic composition at runtime?
With JSTL and struts, I can do this:
<c:forEach var="bit" items="${kibble}"> <tiles:insert page="/tile/bark.jsp"> <tiles:put name="bit" beanName="bit" /> </tiles:insert> </c:forEach>I would imagine that the Velocity equivalent would look like this:
<ol> #foreach ($bit in $kibble) $tiles.put("/tile/bark.vm", { "bit" : $bit }) #end </ol>but alas, it's not implemented by TilesTool. I can work around this by moving "bark.vm" to its own velocimacro but that it fugly as hell. I would prefer parameterized components. ( Dec 07 2004, 06:53:07 AM PST ) Permalink
In JSP with struts tags, it looks like this (assume web.xml has "struts-logic" mapped):
<%@ taglib uri="struts-logic" prefix="logic" %> <logic:redirect forward="home"/>But what about Velocity? Well, it turns out that the VelocityViewServlet stuffs the basic servlet container things into the Velocity context, much like JSTL does in JSPville. Ergo, the $request object itself can be invoked like this:
$request.getRequestDispatcher("/home.do").forward($request,$response)Seems kinda grotty to not be able to use struts symbolic name, but so far that's where my read of the Velocity docs has taken me. As I unpeel the onion, I may be inspired to subclass the VelocityViewServlet as a StrutsViewServlet... it seems like however you're invoking the rendering, you should be able to access, if present, other runtime services such as struts, spring, etc. ( Dec 06 2004, 10:05:35 AM PST ) Permalink
When folks say "service oriented architecture" it still cannotes monolithicism to me. An architecture implies a level of structure definition that sounds rigid; can you re-pour that foundation to adapt redrawn plans? Software development agility and loose coupling should reinforce each other. I prefer to think of architectures and ecosystems. A service oriented functionality ecosystem supplies application functionality as a suite of services. Supporting requirements (as opposed to the core business requirements) such security, logging, persistence, redundancy and caching are each handled independently; they in turn may be provisioned as services that higher level services rely on. This is part of the evolution under way at Technorati; some of the changes are evident in Dave's recent posts but some are just revisions that we're quietly rolling out.
Queues and distributed memory caches are natural elements of a such an environment. In the December issue of Linux Journal, Technorati's use of open source building blocks such as memcached is discussed by Doc Searls.
This is the game:
A memcached server (or a set of servers) can be accessed over the network to store things in a table kept in RAM. When storing things, you can specify a maximum age for the cache entry -- if you go back to fetch it and the elapsed time since it was stored exceeds that age, it gets treated as a cache miss.
Storing things in memcached with the timeout parameter and invalidating cache entries works as long as you have consistent mechanism for calculating the key. If internally you're managing "stories" and each one has an "id" attribute that is unique (a primary key), that's a good candidate to store them with. So for instance putting memcache inside a content management system (CMS) "content service" seems natural. In babytalk code:
public Story fetchStory(int storyId) { Story story = memc.get(storyId); if (story == null) // perhaps more rigorous validation of the fetched object return story; story = StoryDB.findById(storyId); memc.put(storyId, story, AGE); return(story); }
public Story fetchStory(Map atts) { // encapulate whatever attributes uniquely identify a thing CacheKey key = new CacheKey(attrs); Story story = memc.get(key); if (story == null) return story; story = StoryDB.findByAttrs(attrs); memc.put(key, story, AGE); return(story); }
We're in the process of evolving Technorati's infrastructure to one that is loosely coupled, redundant and robust. Our use of memcached is one of the enabling technologies of that evolution.
( Dec 05 2004, 09:22:23 AM PST ) PermalinkI usually only use "cvs import" to create a new CVS module but it can also be used to do a "bulk add." Maybe it's common knowledge for CVS jockies but it's easy to forget about unless oft-used. Here's the scenario:
The only gripe I've heard about Eclipse that I haven't had a good answer for is the absense of Emacs key bindings. Otherwise, what's there not to dig about Eclipse?
I was hopefull that the EPIC plugin would provide at least some of those things for Perl development. This is what I found:
In the meantime, you can enjoy the fruits of this week's labor by pulling it off of CPAN; that's where you can get WebService::Technorati. It's also part of the latest release of the Technorati web services SDK. Thanks to David Wheeler for turning me on to Pod::Simple::HTML ...I'm still trying to figure how he gets it to output nice docs from pod, mine didn't come out nearly that purty. Ah well, I guess that'll be part of next week's Perl fun.
( Nov 19 2004, 10:59:25 PM PST ) PermalinkIt turns out that expat is not installed, grrr. So I fired up Fink Commander and had it gimme some expat lovin'. Tried it again -- bzzzt! This is what I did in the CPAN shell
cpan> o conf makepl_arg "EXPATLIBPATH=/sw/lib EXPATINCPATH=/sw/include" cpan> install XML::Parser-- ding-ding-ding! We have a winner! XML::Parser installed! Thereafter, XML::XPath JFW'd and I'm on my way. ( Nov 17 2004, 04:53:44 PM PST ) Permalink
I poked around the Jakarta bug database and the only mention I could find that close was PR 31442, which described having this
<%@ page language="java" contentType="text/html; charset=UTF-8" %> <%@page pageEncoding="UTF-8"%>and saying that the text was coming back ISO8859-1 when the page is requested as a GET but not as a POST. Well, someone from the Jakarta project marked the bug INVALID glibly saying to ask on the user's mailing list and look at the Connector configuration because it's not a bug. WTF? Are you kidding?
Now I looked around in the Connector stanza's that come in the server.xml and see no mention of encoding configuration attributes. I've got a real simple test case.
<% response.setContentType("text/xml"); %>triggers no funny encoding behavior, I get the data out as good old utf8 just as I wanted but if I did this
<% response.setContentType("text/xml; charset=UTF-8"); %>....kablooey! Mangled encoding! That's just wrong. And if it's not wrong, I think it warrants a better answer than RTFM on the Connectors.
And the problem may not just be isolated to JSP handling. Judging from other reports that are turning up in Google's index pertaining to SetCharacterEncodingFilter, it's affecting the filter implemetation as well.
( Nov 07 2004, 02:46:58 AM PST ) Permalink
In the meantime, the Big Lie that waging war on Iraq has some relationship to 9/11 and terrorism apparently has been successfully Jedi mind-tricked into the American psyche and we're destined to have four more years of high crimes and misdemeanors. It just makes me wonder what is up with the rest of the country. Plenty of folks abroad are, evidently, equally perplexed by this election, as we see in a recent Daily Mail cover.
If you're single, there are some Canadians offering asylum. I'm thinking of packing up the family and moving to New Zealand or something. Just to keep track of where I don't want to be, I'm reckoning with the map:
Source: http://www.electoral-vote.com/
Do you live in a state of stupity? Apparently 59,054,087 of you do.
Remember Lucilla in Gladiator? Yea, that's Connie Nielsen.
Here's a guy with two sons and a wife of 7 or 8 years going to fashion shows, art auctions and movie premiers with his Danish girlfriend. Oh, Lars: you're so damned hollywood! Apparently the paparazzi in Denmark have kept tabs on them as well. |
Back in the old days o' Metallica we had loads of fun but didn't go to fashion shows, art auctions and movie premiers. We didn't sip fine wines either. Oh well, I hope the dude is happy.
( Oct 23 2004, 07:29:35 PM PDT ) Permalink
After raising the notion with Tantek, he plugged the trivial bit to enable this on the Technorati site..
Check it out http://www.technorati.com/cosmos/referer.html (ok, so I'm not very popular in this big 'ol cosmos but anyway...). This is what you do:
OK, I lied. It ain't about me, it's about our new office and the major milestones that Technorati is achieving, the agony of startup setbacks and the ecstacy of... having fun! The details:
Here's Dave's original post.
( Oct 21 2004, 04:34:11 PM PDT ) Permalink