Books and other resources to learn R

Passive Income Systems

From Amy’s Page

12 Books and other resources to learn R

This article was originally posted on UCAnalytics. Link to full version is provided at the bottom.

1. R for Reference

r for everyoneR for Everyone: Advanced Analytics and Graphics – Jared P. Lander

YOU CANalytics Book Rating 5 Stars (5 / 5)

Jared Lander, in his book, wastes no time on basic graphic (comes pre-installed with R), but jumps directly to ggplot2 package (a much advanced and sleek graphical package). This sets the tone for this book i.e. don’t learn things you won’t use in real life applications later. I will highly recommend this book for a fast paced experience to learn R.

R in Action

R in Action - Robert Kabacoff

YOU CANalytics Book Rating 5 Stars (5 / 5)

Here is another exceptional book to start learning R on your own. I must say Robest Kabacoff, the author of this book, has done a phenomenal job with this book. The organization of the book is immaculate and the presentation is friendly. I will highly recommend either this book or R for Everyone to start your journey to learn R.

The r bookThe R Book Michael J. Crawley

YOU CANalytics Book Rating 4.8 Stars (4.8 / 5)

With close to a thousand pages and vast coverage, ‘The R Book’ could be called the Bible for R.  This book starts with simple concepts in R and gradually move to highly advanced topics. The breadth of the book can be estimated through the presence of dedicated chapters on topics as diverse as data-frames, graphics, Bayesian statistics, and survival analysis. Essentially this is a must have reference book for any wannabe R programmer. But for a beginner the thickness of the book could be intimidating.

2. R with Theory

R StatsAn Introduction to Statistical Learning: with Applications in R - Gareth James et al.

YOU CANalytics Book Rating 5 Stars (5 / 5)

This book is a high quality statistical text with R as the software of choice. If you want to be comfortable with fundamental concepts in parallel with learning R, then this is the book for you. Having said this, you will love this book even if you have studied advanced statistics. The book also covers some advanced machine learning concepts such as support machine learning (SVM) and regularization. A great book by all means.

machine learning with RMachine Learning with R Brett Lantz

YOU CANalytics Book Rating 4.5 Stars (4.5 / 5)

If you want to learn R from the machine learning perspective, then this is the book for you. Some people take a lot of interest in fine demarcation between statistics and machine learning; however for me there is too much overlap between the topics. I have given up on the distinction as it makes no difference from the applications perspective. The book introduces R-Weka package – Weka is another open source software used extensively in academic research.

3. R with Applications

 r and data miningR and Data Mining: Examples and Case Studies – Yanchang Zhao

YOU CANalytics Book Rating 4.3 Stars (4.3 / 5)

There are other books that use case studies approach for readers to learn R. I like this book because of the interesting topics this book covers including text mining, social network analysis and time series modeling. Having said this, the author could have put in some effort on formatting of this book which is pure ugly. At times you will feel you are reading a masters level project report while skimming through the book. However, once you get over this aspect the content is really good to learn R.

R rattleData Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) - Graham Williams

YOU CANalytics Book Rating 4.2 Stars (4.2 / 5)

Rattle is no SAS E-miner or SPSS modeler (both commercial GUI based data mining tools). However trust me, apart from a few minor issues Rattle is not at all bad. The book is a great reference to Rattle (a GUI add on package for R to mine data) for data mining. I really hope they keep working on Rattle to make it better as it has a lot of potential.

 4. R Graphics and Programming

GGplot2ggplot2: Elegant Graphics for Data Analysis (Use R!) – Hadley Wickham

YOU CANalytics Book Rating 4 Stars (4 / 5)

‘ggplot 2′ is an exceptional package to create wonderful graphics on R. It is much better than the base graphics that comes pre-installed with R, so I would recommend you start directly with ggplot 2 without wasting your time on base graphics. ‘R for everyone’, the first book we discussed, has a good introduction to ggplot. However, if you want to get to further depths of ggplot-2 then this is the book for you.

Though I prefer ggplot 2, Lattice is another package at par with ggplot 2. A good book to start with Lattice is ‘Lattice: Multivariate Data Visualization with R (Use R!) by Deepayan Sarkar’.

Read full list.

Additional links

Emerging Storage

Passive Income Systems

Emerging Storage, VMware And Pivotal Drive EMC’s Q2 Earnings

Trefis TeamTrefis Team , Contributor

EMC announced its second quarter earnings on July 23, reporting a 5% year-on-year growth in net revenues to $5.9 billion. The company’s services revenues rose by almost 9% over the prior year quarter to $2.6 billion while its product revenues stayed flat at about $3.3 billion. Much of the growth was driven by VMware (+17%), Pivotal (+28%) and RSA Security (7%) while core information storage revenues remained nearly flat at $4 billion.

EMC’s market share in external storage systemsdeclined from 30.2% in Q1 2013 to 29.1% in the first quarter of 2014, according to a recent report by IDC. This was the first quarter since 2008 in which EMC’s market share declined year-over-year. EMC’s revenues from external storage systems in Q1 declined by almost 9% while the industry-wide decline was about 5%. However, EMC’s revenues in Q2 grew higher than the industry average, due to which the company gained share in the market.

Weakness in its core business led to market speculation prior to earnings about EMC spinning off VMware and Pivotal. The Wall Street Journal reported that external pressure from EMC’s large institutional investors could lead the company to spin off some of the fastest-growing businesses within the company such as VMware and Pivotal. However, EMC’s management refuted the speculation and stood by its “federation” business model, wherein some of the acquired companies operate as separate entities while they still collaborate on products for large clients. The company believes that its current setup is ideal for growth for both EMC and the acquired companies.

We have a $30 price estimate for EMC, which is roughly in line with the current market price.

See our full analysis for EMC’s stock

Key Areas Of Growth:

Emerging Storage

EMC’s Emerging Storage products such as XtremIO, Isilon, Atmos and VPLEX were largely responsible for the growth in hardware sales during the past few quarters. The Emerging Storage sub-segment grew by 51% year-over-year (y-o-y) in Q1 2014, which the company attributed to a strong customer response for these products. Despite strong y-o-y growth, the revenues generated by emerging storage solutions stayed flat over Q1. The company attributed this to intermittent demand for some large individual orders. The company expects strong growth for emerging storage solutions on the back of solid demand for software-defined storage, Big Data analytics, cloud storage and flash arrays in the coming quarters.


VMware’s revenues grew by 17% y-o-y to $1.45 billion for the June quarter with growth coming from both product licenses revenues (+16%) and services revenues (+18%). However, VMware’s gross margin within EMC declined by 180 basis points over the prior year quarter to 87.8%. The decline in VMware’s margins led EMC’s overall gross margin to decline by 40 basis points to 62.1%. EMC has invested over $6 billion in acquisitions and internal developments since 2012, of which a significant portion was attributable to VMware related products. These acquisitions included software-defined networking leader Nicira and mobility management leader AirWatch. All the acquisitions will show up as losses on the income statement this year. However, management believes that margins are likely to improve in the future quarters (read: SDN, Hybrid Clouds And AirWatch Help VMware Post Strong Q2 Results).


Pivotal is among the fastest-growing divisions within the company, with 40% y-o-y growth in the first quarter. Although the growth rate was lower than the previous quarter at 29%, the number of orders rose by over 50%. Additionally, Pivotal’s margins expanded from the March quarter. Pivotal’s platform consists of new generation data fabrics, application fabrics and a cloud-independent Platform-as-a-Service to support cloud computing and Big Data applications, which have started gaining traction among customers. Management mentioned that some of Pivotal’s growth may not be immediately realized in the numbers since it is building out a subscription-based revenue stream, which is likely to be beneficial in the long run.

RSA Security

RSA Security, EMC’s information security division, grew by over 11% to almost $1 billion in 2013. The growth continued in the first half of 2014, but the rate of growth was lower than 2013 at about 6% y-o-y. The information security industry is growing, with customers allocating more of their security budgets to intelligence-driven analytics, where RSA Information Security excels, rather than static prevention.


Microsoft Tries Appliances to Build Clouds

Passive Income Systems

Microsoft Tries Appliances to Build Clouds


unique - Edited


The Surface tablet, Xbox One gaming console and the plethora of peripherals are not the only pieces of hardware in the Microsoft stable. The company is making plans to launch new storage and hybrid Azure cloud devices to expand their capabilities and market share.

According to ZDNet, Microsoft is ramping up a storage appliance, aptly named Azure StorSimple 8000, which connects to the Azure cloud and is based on its 2012 acquisition StorSimple. The appliance allows users to store data that’s most used in the local storage while assigning and indexing lesser used files in the cloud.


Microsoft Tries Appliances to Build Clouds

The Azure StorSimple appliance, slated for release in August, will connect Azure StorSimple Manager, which will provide users with simplified access and management to locally and remotely stored files.

Microsoft will continue to sell and support the StorSimple 5000 and 7000 series appliances, which also connect to the Azure cloud but do not integrate with Azure StorSimple Manager.

Unlike other appliances in the software giant’s fold, Microsoft is looking to channel partners – specifically systems integrators – to sell and deploy the StorSimple devices in enterprise and midmarket accounts for disaster recovery, primary and secondary storage, and platforms for application management.

Separate from StorSimple, Microsoft is reportedly gearing up for another run at the Azure in a box strategy. Plans for an Azure private cloud appliance, reportedly being developed under the code name “San Diego,” will provide enterprises with on-premises cloud, network and storage resources. Essentially, Microsoft is attempting to provide enterprises with the same cloud-based Azure functionality in their own data center.

Since 2010, Microsoft has attempted to release Azure appliances. The initial cuts were announced with OEM partners such as Hewlett-Packard, Dell and Fujitsu. Only Fujitsu ended up releasing a commercial product. ZDNet reports the original program puttered out in late 2012 even though no official announcement was made.

The new Azure appliance versions will reportedly come from and be supported by Microsoft, and sold through its systems integrator channel.

While pushing deeper into hardware to support its cloud strategy, Microsoft insists plenty of room exists in the market for its appliances and services as well as similar offerings by its traditional OEM partners. Nevertheless, the expanding hardware portfolio does provide further evidence that Microsoft is increasingly a competitor to companies such as Hewlett-Packard, Dell, Lenovo, EMC and IBM.

And, unlike its Surface tablets, Microsoft seems to have no issue in selling and support hardware devices through its B2B channels.

Related Articles:




Passive Income Systems

IBM Provides Cloud Services to California State Agencies
Next Story


Dropbox for Business Beefs Up Security


IBM Provides Cloud Services to California State Agencies

24/7/365 Network Uptime!

By Barry Levine. Updated July 24, 2014 1:57PM 









There’s a big, new cloud Relevant Products/Services coming to California, powered by IBM. The tech giant said Thursday it will be supplying cloud services for more than 400 state and local agencies.The service, called CalCloud, is the first of its kind in the U.S. at a state level. It will allow data Relevant Products/Services and programs to be stored and made available to all participating agencies, which will only pay for thecomputing Relevant Products/Services workload they actually use.

The cloud services need to comply with a range of requirements from such federal agencies as the IRS and the Social Security Administration, not to mention HIPAA (the Healthcare Insurance Portability and Accountability Act) and the security Relevant Products/Services standards of the National Institute of Standards.

‘Important Step’

Through CalCloud, agencies can now share a common pool of computing resources that the California Department of Technology said would be more efficient than the current setup. Nearly two dozen departments have requested IT Relevant Products/Services services via CalCloud.

Marybel Batjer, secretary of the Government Operations Agency, said in a statement that CalCloud “is an important step towards providing faster and more cost-effective IT services to California state departments and ultimately to the citizens of California.”

IBM will be supplying and managing the infrastructure Relevant Products/Services of CalCloud, and the state’s Department of Technology will take care of the other aspects. Big Blue also said it will work with the state to transfer knowledge and best practices relating to security and systems integration with the department.

As with other cloud services, this pay-for-use arrangement will enable the state agencies to scale Relevant Products/Services up or down the resources they need for variable workloads. It also provides immediate and round-the-clock access to such configurable resources as compute, storage Relevant Products/Services,network Relevant Products/Services and disaster recovery services.

High Performance, Watson

IBM has been rapidly building up its cloud services, and creating more than a hundred software Relevant Products/Services-as-a-service solutions for specific industry needs. The CalCloud project will likely become the basis for similar offerings to other states, as well as to other governments worldwide.

In other IBM news, the company said Wednesday that it will be making high performance computing more accessible through the cloud to clients that need additional capabilities for big data and other computationally intensive workloads.

Very high data throughput speeds will be enabled from IBM’s SoftLayer company, using InfiniBand networking technology to connect SoftLayer bare metal servers. InfiniBand is a networking architecture that delivers up to 56 Gbps.

SoftLayer CEO Lance Crosby said in a statement that “our InfiniBand support is helping to push the technological envelope while redefining how cloud computing can be used to solve complex business Relevant Products/Servicesissues.”

Also on Wednesday, IBM and financial services firm USAA announced that IBM Watson intelligence Relevant Products/Services-as-a-service technology will now be employed for USAA members. It is the first commercial use of Watson in a consumer-facing role. Watson will be used in a pilot project to help military men and women transition from military to civilian life.




Box Raises More Money

Passive Income Systems

Box Raises More Money, Cloud Questions




Cloud storage and content management company Box is fast becoming a focal point of the cloud computing era. While other cloud ventures such as and NetSuite have become productive service providers, Box plods along with high cash-burn rates and an indeterminate exit strategy.

Yesterday, Box announced it raised another $150 million in fresh venture funding, adding to its $80 million in cash reserves and bringing its total investment backing to $450 million. The company is now worth, by some estimates, $2.4 billion, even though its revenues are somewhere around $200 million.

What makes Box an interesting study is its expenses. Until recently, the company spent much more on marketing and communications than anything else in its operations.According to Forbes, Box spent $171 million on sales and marketing in 2013 – nearly a third more than its total revenue. The company says its business model, which relies on adding accounts and subscribers, requires heavy investments in sales, marketing and infrastructure.

Box’s high expense has long been a sour spot. The company is showing signs of reining in expenses and expanding sales faster than spending. In the first quarter of 2014, marketing spending was still up 40 percent over the same quarter in 2013, but the sales doubled.

The challenge Box faces is the same as for many cloud service providers. Cloud revenues compound over time, and deferred revenue counts more than point-in-time sales. Box is counting nearly $90 million in deferred revenue from the first quarter – double over the same period in 2013 – and it’s added more than 5,000 paid corporate accounts. All cloud service providers see weakness in revenues while building their base. If they’re manage the transition period, they will hit an inflection point where compounding recurring revenue will exceed and accelerate past expenses.

Another company experiencing this phenomenon is Adobe. In 2013, Adobe abandoned its traditional software licensing model to embrace cloud subscriptions. Initially, Adobe revenues and profits plummeted to the point where alarms were going off on Wall Street and among partners and users. The precipitous dip made many question whether Adobe could whether the financial transition.

Today, Adobe is profitable and growing. Its compound recurring revenue – based on nearly 2.2 million paid users – is generating positive cash flow. And the company expects to exceed 3.3 million paid subscribers before the end of 2014.

Box is a bit different than many cloud providers, as it supports millions more unpaid users than paid subscribers. This puts a burden on the company to build around that broader base with infrastructure and support, which adds expenses. However, Box may prove the broader base is worth the expense, as they contribute to the conversion of net-new paid accounts.

The ultimate lesson Box may prove is that marketing makes a difference in building cloud brands. If Box turns the corner, goes public and becomes another cloud powerhouse, it will change the rules on what it takes to build a successful cloud-era business: loud and persistent marketing and communications.

Related Articles:



Big Data: The 5 Vs Every person Should Know

Passive Income Systems

Big Data: The 5 Vs Every person Should Know that all are essential


Big Data is a huge point. It will transform our globe entirely and is not a passing craze that will certainly disappear. To know the sensation that allows data, it is usually explained utilizing 5 Vs: Volume, Velocity, Range, Veracity and Value

I assumed it may be worth simply restating what these 5 Vs are, in simple and easy language:.

Quantity refers to the vast amounts of data created every secondly. Just think about all the emails, twitter messages, photos, video clips, sensing unit data etc. we create and discuss every second. We are not speaking Terabytes yet Zettabytes or Brontobytes. On Facebook alone we send out 10 billion messages daily, click the “like’ button 4.5 billion times and upload 350 million new images every day. If we take all the information generated in the world between the start of time and 2008, the exact same amount of data will certainly soon be generated every min! This increasingly makes data sets too big to shop and assess utilizing standard database technology. With large information technology we could now store and use these data sets with the aid of distributed systems, where parts of the data is saved in various locations and combined by software program.

Speed describes the speed at which brand-new information is generated and the speed at which data moves around. Merely think of social networks messages going viral in seconds, the rate at which bank card transactions are looked for fraudulent activities, or the nanoseconds it takes trading systems to analyze social networks networks to pick up signals that set off choices to acquire or sell shares. Large data modern technology enables us now to evaluate the data while it is being produced, without ever putting it into data sources.


Selection refers to the various kinds of information we could now utilize. In the past we concentrated on structured information that neatly matches tables or relational databases, such as financial information (e.g. sales by item or area). Actually, 80 % of the world’s data is now disorganized, and therefore can’t easily be put into tables (consider pictures, video sequences or social media sites updates). With huge information technology we could now utilize differed sorts of information (structured and unstructured) including messages, social networks talks, photos, sensing unit data, video clip or voice recordings and bring them along with even more standard, organized information.

Accuracy describes the messiness or trustworthiness of the information. With several kinds of big data, top quality and precision are less manageable (merely think about Twitter posts with hash tags, abbreviations, typos and colloquial speech and also the reliability and precision of content) however large data and analytics innovation now enables us to work with these type of information. The quantities usually offset the absence of high quality or accuracy.

Worth: Then there is another V to think about when checking out Big Data: Worth! It is all well and great having accessibility to huge data however unless we could turn it into value it is pointless. So you could securely say that ‘worth’ is one of the most vital V of Big Data. It is very important that businesses make a business situation for any sort of try to collect and leverage large information. It is so simple to come under the talk catch and plunge into large data initiatives without a clear understanding of costs and benefits.

I have assembled this little presentation for you to make use of when talking about or discussing the 5 Vs of big data:.


Big Data: The 5 Vs Every person Should Know that all are essential

Big Data: The 5 Vs Everyone Needs to Know

Passive Income Systems

Big Data: The 5 Vs Everyone Need to Know that are essential


Big Data is a large point. It will certainly alter our globe totally and is not a passing fad that will vanish. To recognize the sensation that allows information, it is typically described utilizing 5 Vs: Quantity, Velocity, Assortment, Honesty and Worth

I assumed it may be worth simply restating just what these 5 Vs are, in plain and simple language:.

Volume describes the substantial amounts of data created every secondly. Simply think of all the e-mails, twitter messages, photos, video clips, sensor information and so on we create and share every second. We are not chatting Terabytes however Zettabytes or Brontobytes. On Facebook alone we send 10 billion messages every day, click the “like’ button 4.5 billion times and upload 350 million brand-new pictures every single day. If we take all the information produced on the planet in between the start of time and 2008, the exact same amount of data will soon be created every min! This increasingly makes data sets as well large to store and examine using typical database innovation. With big information modern technology we can now hold and utilize these data sets with the help of dispersed devices, where parts of the data is held in various areas and brought together by software program.

Big Data: The 5 Vs Everyone Needs to Know

Speed refers to the speed at which brand-new data is produced and the rate at which data moves around. Merely think of social media messages going viral in seconds, the speed at which credit card transactions are looked for deceitful tasks, or the milliseconds it takes trading systems to assess social networking sites networks to get signals that trigger decisions to get or market shares. Big information modern technology enables us now to assess the information while it is being generated, without ever putting it into databases.

Range refers to the different sorts of information we could now use. In the past we concentrated on structured information that properly matches tables or relational databases, such as economic data (e.g. sales by product or area). Actually, 80 % of the globe’s data is now disorganized, and for that reason can’t quickly be embeded tables (consider photos, video clip sequences or social networks updates). With large data modern technology we can now take advantage of differed kinds of information (structured and disorganized) consisting of messages, social networking sites chats, pictures, sensor data, video or voice recordings and bring them in addition to more conventional, structured data.

Honesty refers to the messiness or credibility of the data. With numerous forms of huge data, quality and reliability are much less controlled (merely think of Twitter posts with hash tags, abbreviations, typos and colloquial speech in addition to the reliability and accuracy of content) yet big information and analytics technology now enables us to collaborate with these sort of data. The volumes often make up for the absence of top quality or reliability.

Value: Then there is another V to take into account when taking a look at Big Information: Value! It is all well and great having accessibility to big data however unless we can turn it into worth it is ineffective. So you can securely say that ‘value’ is the most crucial V of Big Information. It is essential that businesses make a company situation for any type of attempt to collect and leverage large data. It is so easy to fall into the talk catch and start large information campaigns without a clear understanding of prices and perks.

I have put together this little discussion for you to make use of when talking about or talking about the 5 Vs of big information:

Big Data: The 5 Vs Everyone Needs to Know 


Big Data Evolutionary Forecasting

Passive Income Systems

Jonathan Losos

After comparing the DNA from different anole lizard species in the Caribbean, scientists found predictable patterns in their evolution.

By: Carl Zimmer

July 17, 2014

Comments (8)

email print

Michael Lässig can be certain that if he steps out of his home in Cologne, Germany, on the night of Jan. 19, 2030 — assuming he’s still alive and the sky is clear — he will see a full moon.

Lässig’s confidence doesn’t come from psychic messages he’s receiving from the future. He knows the moon will be full because physics tells him so. “The whole of physics is about prediction, and we’ve gotten quite good at it,” said Lässig, a physicist at the University of Cologne. “When we know where the moon is today, we can tell where the moon is tomorrow. We can even tell where it will be in a thousand years.”

Early in his career, Lässig made predictions about quantum particles, but in the 1990s, he turned to biology, exploring how genes evolved. In his research, Lässig was looking back in time, reconstructing evolutionary history. Looking ahead to evolution’s future was not something that biologists bothered doing. It might be possible to predict the motion of the moon, but biology was so complex that trying to predict its evolution seemed a fool’s errand.

But lately, evolution is starting to look surprisingly predictable. Lässig believes that soon it may even be possible to make evolutionary forecasts. Scientists may not be able to predict what life will be like 100 million years from now, but they may be able to make short-term forecasts for the next few months or years. And if they’re making predictions about viruses or other health threats, they might be able to save some lives in the process.

“As we collect a few examples of predictability, it changes the whole goal of evolutionary biology,” Lässig said.

Replaying the Tape of Life

If you want to understand why evolutionary biologist have been so loathe to make predictions, read “Wonderful Life,” a 1989 book by the late paleontologist Stephen Jay Gould.

Michael Lässig

The book is ostensibly about the Cambrian explosion, a flurry of evolutionary innovation that took place more than 500 million years ago. The oldest known fossils of many of today’s major animal groups date to that time. Our own lineage, the vertebrates, first made an appearance in the Cambrian explosion, for example.

But Gould had a deeper question in mind as he wrote his book. If you knew everything about life on Earth half a billion years ago, could you predict that humans would eventually evolve?

Gould thought not. He even doubted that scientists could safely predict that any vertebrates would still be on the planet today. How could they, he argued, when life is constantly buffeted by random evolutionary gusts? Natural selection depends on unpredictable mutations, and once a species emerges, its fate can be influenced by all sorts of forces, from viral outbreaks to continental drift, volcanic eruptions and asteroid impacts. Our continued existence, Gould wrote, is the result of a thousand happy accidents.

To illustrate his argument, Gould had his readers imagine an experiment he called “replaying life’s tape.” “You press the rewind button and, making sure you thoroughly erase everything that actually happened, go back to any time and place in the past,” he wrote. “Then let the tape run again and see if the repetition looks at all like the original.” Gould wagered that it wouldn’t.

Although Gould only offered it as a thought experiment, the notion of replaying the tape of life has endured. That’s because nature sometimes runs experiments that capture the spirit of his proposal.

Predictable Lizards

For an experiment to be predictable, it has to be repeatable. If the initial conditions are the same, the final conditions should also be the same. For example, a marble placed at the edge of a bowl and released will end up at the bottom of the bowl no matter how many times the action is repeated.

Biologists have found cases in which evolution has, in effect, run the same experiment several times over. And in some cases the results of those natural experiments have turned out very similar each time. In other words, evolution has been predictable.

One of the most striking cases of repeated evolution has occurred in the Caribbean. The islands there are home to a vast number of native species of anole lizards, which come in a staggering variety. The lizards live in the treetops, on forest floors and in open grassland. They come in a riot of colors and shapes. Some are blue, some are green and some are gray. Some are huge and bold while others are small and shy.

To understand how this diversity evolved, Jonathan Losos of Harvard University and his students gathered DNA from the animals. After they compared the genetic material from different species, the scientists drew an evolutionary tree, with a branch for every lizard species.

Jonathan Losos measuring a lizard in the field.

When immigrant lizards arrived on a new island, Losos found, their descendants could evolve into new species. It was as if the lizard tape of life was rewound to the same moment and then played again.

If Gould were right, the pattern of evolution on each island would look nothing like the pattern on the other islands. If evolution were more predictable, however, the lizards would tend to repeat the same patterns.

Losos and his students have found that evolution did sometimes veer off in odd directions. On Cuba, for example, a species of lizard adapted to spending a lot of time in the water. It dives for fish and can even sprint across the surface of a stream. You won’t find a fishing lizard on any other Caribbean island.

For the most part, though, lizard evolution followed predictable patterns. Each time lizards colonized an island, they evolved into many of the same forms. On each island, some lizards adapted to living high in trees, evolving pads on their feet for gripping surfaces, along with long legs and a stocky body. Other lizards adapted to life among the thin branches lower down on the trees, evolving short legs that help them hug their narrow perches. Still other lizards adapted to living in grass and shrubs, evolving long tails and slender trunks. On island after island, the same kinds of lizards have evolved.

“I think the tide is running against Gould,” Losos said. Other researchers are also finding cases in which evolution is repeating itself. When cichlid fish colonize lakes in Africa, for example, they diversify into the same range of forms again and again.

“But the question is: What’s the overall picture?” Losos asked. “Are we cherry-picking the examples that work against him, or are we going to find that most of life is deterministic? No one is going to say Gould is completely wrong. But they’re not going to say he’s completely right either.”

Evolution in a Test Tube

Natural experiments can be revealing, but artificial experiments can be precise. Scientists can put organisms in exactly the same conditions and then watch evolution unfold. Microbes work best for this kind of research because scientists can rear billions of them in a single flask and the microbes can go through several generations in a single day. The most spectacular of these experiments has been going on for 26 years — and more than 60,000 generations — in the lab of Richard Lenski at Michigan State University.

Lenski launched the experiment with a single E. coli microbe. He let it divide into a dozen genetically identical clones that he then placed in a dozen separate flasks. Each flask contained a medium — a cocktail of chemicals mixed into water — that Lenski created especially for the experiment. Among other ingredients, it contained glucose for the bacteria to feed on. But it was a meager supply, which ran out after just a few hours. The bacteria then had to eke out their existence until the next morning, when Lenski or his students transferred a little of the microbe-laced fluid into a fresh flask. With a new supply of glucose, they could grow for a few more hours. Lenski and his students at Michigan State have been repeating this chore every day since.

Lenski thought the tape of life would replay differently with each rewind. But that’s not what happened.

At the outset, Lenski wasn’t sure what would happen, but he had his suspicions. He expected mutations to arise randomly in each line of bacteria. Some would help the microbes reproduce faster while others would be neutral or even harmful. “I imagined they’d be running off in one direction or another,” Lenski said.

In other words, Lenski thought the tape of life would replay differently with each rewind. But that’s not what happened. What Lenski witnessed was strikingly similar to the evolution that Jonathan Losos has documented in the Caribbean.

Lenski and his students have witnessed evolutionary oddities arise in their experiment — microbial versions of the Cuban fishing lizards, if you will. In 2003, Lenski’s team noticed that one line of bacteria had abruptly switched from feeding on glucose to feeding on a compound called citrate. The medium contains citrate to keep iron in a form that the bacteria can absorb. Normally, however, the bacteria don’t feed on the citrate itself. In fact, the inability to feed on citrate in the presence of oxygen is one of the defining features of E. coli as a species.

But Lenski has also observed evolution repeat itself many times over in his experiment. All 12 lines have evolved to grow faster on their meager diet of glucose. That improvement has continued to this day in the 11 lines that didn’t shift to citrate. Their doubling time — the time it takes for them to double their population — has sped up 70 percent. And when Lenski and his students have pinpointed the genes that have mutated to produce this improvement, they are often the same from one line to the next.

“That’s not at all what I expected when I started the experiment,” Lenski said. “I evidently was wrong-headed.”

Getting Complex Without Getting Random

Lenski’s results have inspired other scientists to set up more complex experiments.Michael Doebeli, a mathematical biologist at the University of British Columbia, wondered how E. coli would evolve if it had two kinds of food instead of just one. In the mid-2000s, he ran an experiment in which he provided glucose — the sole staple of Lenski’s experiment — and another compound E. coli can grow on, known as acetate.

Doebeli chose the two compounds because he knew that E. coli treats them very differently. When given a choice between the two, it will devour all the glucose before switching on the molecular machinery for feeding on acetate. That’s because glucose is a better source of energy. Feeding on acetate, by contrast, E. coli can only grow slowly.

Something remarkable happened in Doebeli’s experiment — and it happened over and over again. The bacteria split into two kinds, each specialized for a different way of feeding. One population became better adapted to growing on glucose. These glucose-specialists fed on the sugar until it ran out and then slowly switched over to feeding on acetate. The other population became acetate-specialists; they evolved to switch over to feeding on acetate even before the glucose supply ran out and could grow fairly quickly on acetate.

When two different kinds of organisms are competing for the same food, it’s common for one to outcompete the other. But in Doebeli’s experiment, the two kinds of bacteria developed a stable coexistence. That’s because both strategies, while good, are not perfect. The glucose-specialists start out growing quickly, but once the glucose runs out, they slow down drastically. The acetate-specialists, on the other hand, don’t get as much benefit from the glucose. But they’re able to grow faster than their rivals once the glucose runs out.

Doebeli’s bacteria echoed the evolution of lizards in the Caribbean. Each time the lizards arrived on an island, they diversified into many of the same forms, each with its own set of adaptations. Doebeli’s bacteria diversified as well — and did so in flask after flask.

To get a deeper understanding of this predictable evolution, Doebeli and his postdoctoral researcher, Matthew Herron, sequenced the genomes of some of the bacteria from these experiments. In three separate populations they discovered that the bacteria had evolved in remarkable parallel. In every case, many of the same genes had mutated.

Although Doebeli’s experiments are more complex than Lenski’s, they’re still simple compared with what E. coli encounters in real life. E. coli is a resident of the gut, where it feeds on dozens of compounds, where it coexists with hundreds of other species, where it must survive changing levels of oxygen and pH, and where it must negotiate an uneasy truce with our immune system. Even if E. coli’s evolution might be predictable in a flask of glucose and acetate, it would be difficult to predict how the bacteria would evolve in the jungle of our digestive system.

E. Coli

And yet scientists have been surprised to find that bacteria evolve predictably inside a host.Isabel Gordo, a microbiologist at the Gulbenkian Institute of Science in Portugal, and her colleagues designed a clever experiment that enabled them to track bacteria inside a mouse. Mice were inoculated with a genetically identical population of E. coliclones. Once the bacteria arrived in the mice’s guts, they started to grow, reproduce and evolve. And some of the bacteria were carried out of the mouse’s body with its droppings. The scientists isolated the experimental E. coli from the droppings. By examining the bacteria’s DNA, the scientists could track their evolution from one day to the next.

The scientists found that it took only days for the bacteria to start evolving. Different lineages of E. coli picked up new mutations that made them reproduce faster than their ancestors. And again and again, they evolved many of the same traits. For example, the original E. coli couldn’t grow if it was exposed to a molecule called galactitol, which mammals make as they break down sugar. However, Gordo’s team found that as E. coli adapted to life inside a mouse, it always evolved the ability to withstand galactitol. The bacteria treated a living host like one of Lenski’s flasks — or an island in the Caribbean.

Evolution’s Butterfly Effect

Each new example of predictable evolution is striking. But, as Losos warned, we can’t be sure whether scientists have stumbled across a widespread pattern in nature. Certainly, testing more species will help. But Doebeli has taken a very different approach to the question: He’s using math to understand how predictable evolution is overall.

Doebeli’s work draws on pioneering ideas that geneticists like Sewall Wrightdeveloped in the early 1900s. Wright pictured evolution like a hilly landscape. Each point on the landscape represents a different combination of traits — the length of a lizard’s legs versus the width of its trunk, for example. A population of lizards might be located on a spot on the landscape that represents long legs and a narrow trunk. Another spot on the landscape would represent short legs and a narrow trunk. And in another direction, there’s a spot representing long legs and a thick trunk.

The precise combinations of traits in an organism will influence its success at reproducing. Wright used the elevation of a spot on the evolutionary landscape to record that success. An evolutionary landscape might have several peaks, each representing one of the best possible combinations. On such a landscape, natural selection always pushes populations up hills. Eventually, a population may reach the top of a hill; at that point, any change will lead to fewer offspring. In theory, the population should stay put.

The future of evolution might seem easy to predict on such a landscape. Scientists could simply look at the slope of the evolutionary landscape and draw a line up the nearest hill.

“This view is just simply wrong,” said Doebeli.

That’s because the population’s evolution changes the landscape. If a population of bacteria evolves to feed on a new kind of food, for example, then the competition for that food becomes fierce. The benefit of specializing on that food goes down, and the peak collapses. “It’s actually the worst place to be,” Doebeli said.

“Over short periods of time, it is predictable, if you have enough information. But you can’t predict it over long periods of time.”

To keep climbing uphill, the population has to veer onto a new course, toward a different peak. But as it travels in a new direction, it alters the landscape yet again.

Recently, Doebeli and Iaroslav Ispolatov, a mathematician at the University of Santiago in Chile, developed a model to understand how evolution works under these more complicated conditions. Their analysis suggests that evolution is a lot like the weather — in other words, it’s difficult to predict.

In the early 1960s, a scientist at the Massachusetts Institute of Technology namedEdward Lorenz developed one of the first mathematical models of weather. He hoped that they would reveal repeatable patterns that would help meteorologists predict the weather more accurately.

But Lorenz discovered just the opposite. Even a tiny change to the initial conditions of the model led, in time, to drastically different kinds of weather. In other words, Lorenz had to understand the model’s initial conditions with perfect accuracy to make long-term predictions about how it would change. Even a slight error would ruin the forecast.

Mathematicians later dubbed this sensitivity chaos. They would find that many systems — even surprisingly simple ones — behave chaotically. One essential ingredient for chaos is feedback — the ability for one part of the system to influence another, and vice versa.  Feedback amplifies even tiny differences into big ones. When Lorenz presented his results, he joked that the flap of a butterfly’s wings in Brazil could set off a tornado in Texas.

Evolution has feedbacks, too. A population evolves to climb the evolutionary landscape, but its changes alter the landscape itself. To see how these feedbacks affected evolution, Doebeli and Ispolatov created their own mathematical models.  They would drop populations onto the evolutionary landscape at almost precisely the same spot. And then they followed the populations as they evolved.

In some trials, the scientists only tracked the evolution of a few traits, while in others, they tracked many. They found that in the simple models, the populations tended to follow the same path, even though they started out in slightly different places. In other words, their evolution was fairly easy to predict.

But when the scientists tracked the evolution of many traits at once, that predictability disappeared. Despite starting out under almost identical conditions, the populations veered off on different evolutionary paths. In other words, evolution turned to chaos.

Doebeli and Isplolatov’s research suggests that for the most part, evolution is too chaotic to be predicted with any great accuracy. If they are right, then the successes that scientists like Losos and Lenski have had in finding predictable evolution are the exceptions that prove the rule. The future of evolution, for the most part, is as fundamentally unknowable as the future of the weather.

This conclusion may seem strange coming from Doebeli. After all, he has conducted experiments on E. coli that have shown just how predictable evolution can be. But he sees no contradiction. “It’s just a matter of time scales,” he said. “Over short periods of time, it is predictable, if you have enough information. But you can’t predict it over long periods of time.”

Darwin’s Prophets

Even over short periods of time, accurate forecasts can save lives. Meteorologists can make fairly reliable predictions about treacherous weather a few days in advance. That can be enough time to evacuate a town ahead of a hurricane or lay in supplies for a blizzard.

Richard Lenski thinks that recent studies raise the question of whether evolutionary forecasting could also provide practical benefits. “I think the answer is definitely yes,” he said.

To predict which strains would dominate the 2002-2003 flu season (right), Michael Lässig and Marta Łuksza counted the number of beneficial mutations in the strains from the previous season (left).

One of the most compelling examples comes from Lässig. Using his physics background, he is working on a way to forecast the flu.

Worldwide, the flu kills as many as 500,000 people a year. Outside of the tropics, infections cycle annually from a high in winter to a low in summer. Flu vaccines can offer some protection, but the rapid evolution of the influenza virus makes it a moving target for vaccination efforts.

The influenza virus reproduces by invading the cells in our airway and using their molecular machinery to make new viruses. It’s a sloppy process, which produces many new mutants. Some of their mutations are harmful, crippling the viruses so that they can’t reproduce. But other mutations are harmless. And still others will make new flu viruses even better at making copies of themselves.

As the flu virus evolves, it diverges into many different strains. A vaccine that is effective against one strain will offer less protection against others. So vaccine manufacturers try to provide the best defense each flu season by combining the three or four most common strains of the flu.

There’s a problem with this practice, however. Manufacturing a new season’s flu vaccines takes several months. In the United States and other countries in the Northern Hemisphere, vaccine manufacturers must decide in February which strains to use for the flu season that starts in October. They often make the right prediction. But sometimes a strain that’s not covered by the vaccine unexpectedly comes to dominate a flu season. “If something goes wrong, it can cost thousands of lives,” Lässig said.

A few years ago, Lässig started to study the vexing evolution of the flu. He focused his attention on the rapidly evolving proteins that stud the shell of the flu virus, called hemagglutinin. Hemagglutinin latches on to receptors on our cells and opens up a passageway for the virus to invade.

When we get sick with the flu, our immune system responds by building antibodies that grab onto the tip of the hemagglutinin protein. The antibodies prevent the viruses from invading our cells and also make it easier for immune cells to detect the viruses and kill them. When we get flu vaccines, they spur our immune system to make those antibodies even before we get sick so that we’re ready to wipe out an infection as soon as it starts.

Scientists have been sequencing hemagglutinin genes from flu seasons for more than 40 years. Poring over this trove of information, Lässig was able to track the evolution of the viruses. He found that most mutations that altered the tip of the hemagglutinin protein helped the viruses reproduce more, probably because they made it difficult for antibodies to grab onto them. Escaping the immune system, they can make more copies of themselves.

Michael Lässig lecturing.

Each strain of the flu has its own collection of beneficial mutations. But Lässig noticed that the viruses also carry harmful mutations in their hemagglutinin gene. Those harmful mutations make hemagglutinin less stable and thus less able to open up cells for invasion.

It occurred to Lässig that these mutations might determine which strains would thrive in the near future. Perhaps a virus with more beneficial mutations would be more likely to escape people’s immune systems. And if they escaped destruction, they would make more copies of themselves. Likewise, Lässig theorized, the more harmful mutations a virus had, the more it would struggle to invade cells.

If that were true, then it might be possible to predict which strains would become more or less common based on how many beneficial and harmful mutations they carried. Working with Columbia University biologist Marta Łuksza, he came up with a way to score the evolutionary potential of each strain of the flu. For each beneficial mutation, a strain earned a point. For each harmful one, Lässig and Łuksza took a point away.

The scientists examined thousands of strains of the flu that have been sampled since 1993. They would calculate the score for every strain in a given year and then use that score to predict how it would fare the following year. They correctly forecast whether a strain would grow or decline about 90 percent of the time. “It’s a simple procedure,” Lässig said. “But it works reasonably well.”

Lässig and his colleagues are now exploring ways to improve their forecast. Lässig hopes to be able to make predictions about future flu seasons that the World Health Organization could consult as they decide which strains should be included in flu vaccines. “It’s just a question of a few years,” he said.

The flu isn’t the only disease that evolutionary forecasting could help combat. Bacteria are rapidly evolving resistance to antibiotics. If scientists can predict the path that the microbes will take, they may be able to come up with strategies for putting up roadblocks.

Forecasting could also be useful in fighting cancer. When cells turn cancerous, theyundergo an evolution of their own. As cancer cells divide, they sometimes gain mutations that let them grow faster or escape the immune system’s notice. It may be possible to forecast how tumors will evolve and then plan treatments accordingly.

Beyond its practical value, Lässig sees a profound importance to being able to predict evolution. It will bring the science of evolutionary biology closer to other fields like physics and chemistry. Lässig doesn’t think that he’ll be able to forecast evolution as easily as he can the motion of the moon, but he hopes that there’s much about evolution that will prove to be predictable. “There’s going to be a boundary, but we don’t know where the boundary is,” he said.

email print


Unique Traffic Generation Wordpress SEO Plugin by SEOPressor