~~~ Ramana Rao's INFORMATION FLOW ~~~ Issue 2.3 ~~ Mar 2003 ~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Information Flow is a monthly opt-in newsletter. Your email address was entered on www.ramanarao.com or www.inxight.com. You may forward this issue in its entirety. Send me your thoughts and questions: [email protected] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~ IN THIS ISSUE ~~~ March 2003 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Introduction * 26000 Languages * A Review of Open Innovation by Henry Chesbrough * Light Linking on Language ~~~ Introduction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I had fun with this month's issue, as I hope you will see. The first article and the link section focus on linguistics, a central element of the technology at Inxight. The second article is a review of a book about capitalizing on innovation which I recommend strongly. Enjoy! As always, comments appreciated. ~~~ 26000 languages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In 1981, Ron Kaplan and Martin Kay, gave a paper on computational morphology that started the effort that has lead to Inxight's core engine for linguistic analysis. Morphology is an area of linguistics that explores the structure of words. Kaplan and Kay intended to work on the structure of sentences but they took a step back to get warmed up. But an interesting thing happened. They found an amazingly interesting world inside words. (Like discovering that way down in the deepest ocean, under extreme pressure and with no light, and geysers shooting out 400 degree Celsius water at fire hose speeds, live tube worms, giant clams, and blind crabs in baths of sulphuric acid.) If I were to give you a word that you didn't know (for example morphology), you would know whether to bother looking it up in the English dictionary or not. Or if you speak French, I could give you a word, and you might say, no way, not English, but it could be French. So what is it you know? You know something about the structure of words in the languages you speak. Words are made from parts, and you recognize the parts: prefixes, suffixes, middles, endings, etc. A linguist would speak of stems, morphemes, inflections, and many other words that describe the particles that make up words. What Kaplan and Kay theorized is that the structure that you see inside a word could be modeled with a simple kind of abstract computer called a finite state machine. In all languages. This is a theory about a "universal" in languages, which is an achievement with serious bragging rights among linguists. We won't go into the intricacies of finite state machines (FSMs). Basically it means that you should be able to rip words apart perfectly in all human languages. And fast. Great theory, now on to reducing this to practice. All through the eighties, Kaplan and his colleagues, developed tools and algorithms for working with really big finite state machines and achieving their theoretical speed. And tools to allow linguists to build lists of words, word parts, and assembly instructions in ways familiar to them. This was also a user interface problem: the linguists didn't speak FSMs, but instead linguisticano. Then on through the 80s and 90s, linguists at universities around the world started to model the words of different languages with FSMs. 20 years of such efforts have proven Kaplan and Kay right. Words in all languages can be ripped apart fast. Inxight's core LinguistX Platform currently support 26 languages. If you only know English you would hardly be able to appreciate the challenges thrown by other languages. The ripping action starts with breaking the stream of characters into words. Easy you say, but Japanese doesn't use space between words. And the level of ambiguity is comparable to that in "parsing" an English sentence, the classic example being "time flies like an arrow". As a more direct example of putting spaces in, try: GODISNOWHERE. The next step is to rip a word apart into its beginnings, middles, and ends. The beginnings and ends (and sometimes similar things in the insides) can get quite complicated in highly inflected languages. In Finnish, there are certain nouns and verbs in some forms that have never been uttered. And the middles can also get quite complicated, say, in compounding languages. You can probably construct a German word that prints from Hamburg to Munich. But it's not just about 26 languages, but about 26000 languages. Not just Japanese or German, but also languages like Pfizerese and AstraZenecese. Every organization or community speaks in its own language, certainly based on one of the world languages, but many of these nested languages contain quite large vocabularies and even occasionally language patterns that wouldn't be understood by speakers of the base language. Certainly Dilbert knows this. Consider Marketingu and Engineerish. The comic strip is poking fun at languages that seem to have no purpose other than to hide incompetence or to keep the powerless in their cubicles. Yet, legitimately, nested languages often support real needs for greater communicational economy and greater precision in specialized disciplines. You can hardly take any college or even high school course without having a section at the end of each section devoted to the special vocabulary of the subject. Words and concepts are the gateways into the ideas of a discipline. This is another way to understand the importance of the languages we create to allow people to access, route, and mine content more effectively. We start with simple controlled word list and move up to more formally structured "controlled vocabularies" and "taxonomies." And eventually we focus on the links and relationships between the words, and there we start to call the structures ontologies. Certainly power comes with moving up the representational food chain, but we should not forget that at the bottom are, say the single cell organisms of sharing thoughts, the words. ~~~ Review of "Open Innovation" by Henry Chesbrough ~~~~~~~~~~~~ Open Innovation: The New Imperative for Creating and Profiting from Technology By Henry William Chesbrough ~> http://ramanarao.com/cgi-bin/book.cgi?isbn=1578518377 Harvard Business School Press, March 2003 Maybe it was in the early nineties that I heard Ron Kaplan (the same as mentioned above) ask the question how does PARC with its 200 researchers compete with say 100 garages with 2 ex-researchers. This question was asked in the context of many years of discussing Xerox fumbling the future. In fact, there must be a Microsoft Word template file (*.dot) for journalists, starting: "Xerox PARC, the famed research center, nestled in the hills behind Stanford, invented blah blah blah ... and failed to capture the commercial value." Henry Chesbrough, a Harvard Business School professor, has just published a book that maps and explains the world that Ron Kaplan's question is gesturing at. Chesbrough interviewed me in 1997, just after we spun Inxight up but not quite out of Xerox. I've talked with Chesbrough a number of times over the years. Open Innovation is carefully-researched, well-organized, articulate, and fun to read. And, Chesbrough comes through with many clear observations and valuable insights. Even as someone who, for 10 years at PARC, participated in many discussions on the question of capitalizing on research, and then took the spin-out journey myself, I have gained a broader perspective and a coherent framework to organize my experiences. If you are not interested in how large companies can capitalize from R&D, you will still find this book interesting if you care about innovation at all. Just as the Open Source movement is not just about software and software business, but about business and social practices in general, Open Innovation is also about the much broader economic and social realities necessitating a change in the management of innovation. Xerox, for its failure, provides the perfect starting point for the book as a model of how you can hit the highest highs of invention and still sink to the deepest abyss blah blah. Actually, the book is extremely fair to the challenges that were faced by Xerox and doesn't stand on the simplistic theories of the Word template. Instead, it focuses on the broader context that enables the truly radical inventions in the first place, and the set of structural factors and social changes that made capitalizing on the inventions near impossible. I always felt that if Xerox had managed to control its ideas, that *you* wouldn't be scrolling this email right now. It was a massive parallel social investment into various configurations of technologies and markets and business models that really created the computing and networking infrastructure we all have now. Chesbrough covers this with just the right amount of historical background and focused research. He looks at the birth of industrial research at the beginning of the 20th century and how large industrial giants with near-monopolies on the practical knowledge of their arena were best served by a closed model of innovation. And he looks at the more recent changes in the knowledge landscape. For example, the increasing availability of knowledge enabled by the growing mobility of high-skill people, and the improved identification and realization of high-risk, high-reward market opportunities enabled by venture capital. All of this leads to the model of open innovation as the best that large companies can do as we move forward. The model moves away from regulating and controlling knowledge and knowledge workers to fostering the effective flow of ideas into and out of a company. It provides a new vision of how a company can capture the greatest achievable share of value for ideas it generates. More broadly, the world of open innovation depicted in the book has implications for academic research and government policy. And even for small companies and teams of all kinds. Beyond Xerox, Chesbrough looks at the successful transformation of IBM research from a closed to an open model, at Intel's experience in connecting with academic research and use of venture capital, and Lucent's corporate venture effort, which successful as it was, ends on a grim note. Along the way, Chesbrough provides insights from Cisco, Microsoft, Merck, and many other companies. I am a big fan of books intended for broad audiences that can be read by those that generally don't read books of the given genre. For example, I like reading science books that can be read by non-scientists, that communicate the essential ideas of a subject matter simply, and that convey a sense of why anybody would ever choose to be a scientist. Substitute design, technology, or business into that sentence, and I would apply the same test. Chesbrough's book succeeds easily because of its mix of scholarship and practical conception. I would certainly recommend it to entrepreneurs and business managers, and also to scientists, designers, and technologists. ~~~ Light Linking on Language ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~> Kids Creoles, and the Coconuts. Thinking about nested languages, I vaguely remembered stuff about pidgins and creoles. Web searching, I found this article from 1992 on an interesting language experiment by Bickerton who is one of the pioneers on studying Creoles. ~> At Discover.com, search for "Creoles" in archive http://www.discover.com/archive/index.html Or perhaps the following might work: http://208.245.156.153/archive/output.cfm?ID=25 ~> The Ethnosphere and Language Extinction. Last month, I was enthralled by a talk by Wade Davis at the TED conference. Wade is an anthropologist and an explorer with the National Geographic Society. Juxtaposing his coined concept of Ethnosphere with some reading on language extinction, besides provoking social questions, brings up questions about the relationship between language and thought. ~> http://www.sacredbalance.com/web/drilldown.html?sku=91 ~> http://news.nationalgeographic.com/news/2002/06/0627_020628_wadedavis.html ~> http://abcnews.go.com/sections/world/DailyNews/endangered_languages.html ~> http://www.lsadc.org/web2/faq/endangered.htm ~> The relationship between language and thought has been a topic in linguistics and psychology for most of the 20th century. Benjamin Whorf, who worked as fire prevention specialist for an insurance company and did linguistics research on the side, argued for the influence of language on thought. Though Whorfian theory fell into disregard, more recent work is reconsidering the possibilities. ~> http://sciam.com/article.cfm?articleID=00009A6B-B402-1CDA-B4A8809EC588EEDF ~> Whorf to Whorf. On the topic of Whorfs, there is a language that probably won't be going extinct any time soon. In fact, it may still need to be invented. It isn't transmitted from parent to child, but rather from trekkie to trekkie. It's Klingon. ~> http://www.kli.org/ ~> The "Cold Fusion" of Linguists. Okay if that's not enough quirkiness in my linking, then try this one on the Nostratic hypothesis on the roots of language. ~> http://www.santafe.edu/~johnson/articles.nostratic.html ~> Meanwhile, for the seriously interested, Pinker's book beats a fistful (or fiveful) of light links. And it meets my book tests above grandly. Or if you prefer good documentary, years ago, there was a series on PBS called the Story of English with an excellent companion book. ~> The Language Instinct http://ramanarao.com/cgi-bin/book.cgi?isbn=0060958332 ~> The Story of English http://ramanarao.com/cgi-bin/book.cgi?isbn=0140154051 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ramana Rao is Founder and CTO of Inxight Software, Inc. Copyright (c) 2003 Ramana Rao. All Rights Reserved. You may forward this issue in its entirety. See: http://www.ramanarao.com Send: [email protected] Archive: http://www.ramanarao.com/informationflow/archive/ Subscribe: mailto:[email protected] Unsubscribe: mailto:[email protected]