Hot Big Information Trends data source and data-management technologies.

Passive Income Systems

Here are the top-ten large data fads:.

Hadoop is coming to be the underpinning for distributed big information administration. Hadoop is a distributed documents device that could be made use of together with MapReduce to refine and analyze substantial quantities of information, making it possible for the big information trend. Hadoop will be securely incorporated into data warehousing modern technologies to ensure that structured and unstructured information can be integrated more effectively.

Large information makes it feasible to take advantage of data from sensors to alter business end results. More and more businesses are utilizing highly advanced sensors on the tools that runs their procedures. New technologies in huge data innovation are making it possible to analyze all this information to obtain progressed alert of issues that could be fixed to protect business.

Large information can assist a company campaign come to be a real-time activity to improve revenue.Com panies in markets such as retail are making use of real-time streaming data analytics to take note of consumer actions and deal rewards to improve revenue each client.

Big information can be incorporated with historical information stockrooms to change preparing. Huge information could offer a company with a better understanding of massive quantities of information regarding their company. This details concerning the present state of the business can be incorporated with historic data to obtain a full see of the context for company adjustment.

Large information can transform the way conditions are taken care of by including anticipating analytics. Progressively, healthcare experts are planning to big information remedies to obtain insights into illness by contrast signs and examination cause databases of arise from hundreds of thousands of various other instances. This permits professionals to quicker anticipate outcomes and conserve lives.

Cloud computer will certainly change the way that data will be taken care of in the future. Cloud computer is invaluable as a device to assist the development of huge information. Increasingly, cloud support services that are optimized for information will certainly mean that many more solutions and shipment models will make big information more practical for firms of all sizes.

Safety and governance will certainly be the distinction in between success and failure of companies leveraging huge information. Huge data can be a massive perk, however it isn’t really risk-free. Business will certainly discover that if they are not mindful, it is possible to subject exclusive info via huge information evaluation. Business should stabilize the should assess outcomes with ideal methods for security and governance.

Honesty, or truthfulness, of large data will come to be the most vital problem for the coming year. Numerous business can obtain carried away with the ability to analyze massive quantities of data and return convincing results that predict business results. Therefore, companies will locate that the truthfulness of the information should become a leading priority or decision production will certainly experience.

As large data vacates the experimental stage, even more packaged offerings will certainly be established. Many big information projects started over the previous few years have actually been experimental. Companies are cautiously dealing with new devices and modern technology. Now big data is about to get in the mainstream. Lots of packaged large data offerings will flood the marketplace.

Use instances and new cutting-edge methods to use large information will take off. Early successes with big data in different sectors such as manufacturing, retail, and health care will cause a lot more sectors checking out means to take advantage of large amounts of data to change their markets.

Ten Hot Big Information Trends.

Utilizing Big Data to Make Much better Pricing Decisions

Passive Income Systems

Harnessing the flood of information available from client communications permits firms to cost suitably– and gain the rewards.

It’s hard to overstate the relevance of obtaining prices right. Typically, a 1 percent rate increase equates into an 8.7 percent increase in running profits (presuming no loss of volume, certainly). Yet we determine that approximately 30 percent of the countless pricing choices business make yearly fail to deliver the very best rate. That’s a bunch of lost profits. And it’s especially distressing taking into consideration that the flood of information now available offers business with a chance to make significantly much better pricing decisions. For those able to bring order to big information’s intricacy, the value is significant.

We’re not suggesting it’s very easy: the variety of customer touchpoints keeps blowing up as digitization energies expanding multichannel complexity. Yet price points have to keep up. Without uncovering and acting upon the chances big data presents, several business are leaving countless bucks of profit on the table. The key to increasing profit margins is to harness large information to find the very best price at the product– not category– level, instead of drown in the numbers flood.

As well Big to Be successful
For every single product, firms must have the ability to locate the optimal cost that a consumer wants to pay. Ideally, they would certainly consider highly specific insights that would influence the rate– the cost of the next-best affordable product compared to the worth of the product to the client, for example– then come to the very best rate. Indeed, for a firm with a handful of items, this sort of prices technique is uncomplicated.

rates strategiesIt’s even more bothersome when item numbers balloon. About 75 percent of a normal company’s income originates from its basic items, which typically number in the many thousands. Lengthy, hand-operated techniques for setting prices make it practically impossible to view the rates designs that could open worth. It’s just as well overwhelming for large business to get granular and take care of the complexity of these pricing variables, which transform frequently, for thousands of products. At its core, this is a large data issue.

Numerous marketing experts wind up simply burying their heads in the sand. They create rates based on simplistic elements such as the price to generate the product, basic margins, rates for comparable items, volume markdowns etc. They draw on old methods to take care of the products as they constantly have or point out “market prices” as a justification for not assaulting the problems. Possibly worst of all, they rely upon “tried and examined” historic methods, such as a global 10 percent price trip on everything.

“Just what occurred in method then was that each year we had cost rises based on scale and volume, yet not based upon science,” states Roger Britschgi, head of sales procedures at Linde Gases. “Our individuals merely really did not believe it was feasible to do it differently. And, quite truthfully, our individuals were not well prepared to encourage our clients of the should improve costs.”.

Four Steps to Turn Information into Revenues.
The trick to far better rates is understanding totally the data now at a business’s disposal. It needs not zooming out but focusing. As Tom O’Brien, team vice head of state and general manager for advertising and sales at Sasol, said of this approach, “The [sales] teams understood their prices, they may have recognized their volumes, but this was something much more: very granular data, essentially from each and every invoice, by item, by consumer, by product packaging.”.

In fact, a few of the most stimulating instances of making use of large information in a B2B context in fact transcend rates and touch on other elements of a company’s industrial engine. For instance, “dynamic discount scoring” supplies rate advice at the level of specific offers, decision-escalation points, motivations, performance rating, and a lot more, based on a set of comparable win/loss discounts. Using smaller sized, appropriate deal examples is important, as the aspects tied to any type of one offer will differ, leaving an overarching set of offers pointless as a standard. We have actually viewed this used in the modern technology sector with terrific success– generating boosts of four to eight portion points in return on sales (versus same-company control groups).

To obtain adequately granular, firms should do 4 factors.

Hear the information. Setting the best rates is not a data challenge (firms typically already rest on a treasure of information); it’s an evaluation challenge. The best B2C business recognize the best ways to analyze and act on the wealth of information they have, however B2B companies often take care of information as opposed to utilize it to drive choices. Great analytics can help firms recognize how factors that are usually forgotten– such as the broader financial situation, product choices and sales-representative settlements– reveal just what drives prices for every client segment and item.

Automate. It’s as well costly and taxing to analyze countless items by hand. Automated systems can determine narrow sections, identify what drives worth apiece and match that with historical transactional information. This permits firms to establish rates for clusters of products and sectors based on information. Automation likewise makes it a lot easier to reproduce and fine-tune analyses so it’s not needed to go back to square one each time.

Build abilities and confidence. Executing brand-new rates is as considerably an interactions challenge as an operational one. Successful business overinvest in thoughtful change programs to help their offers forces understand and accept brand-new rates strategies. Companies need to function closely with sales representatives to describe the factors for the price referrals and exactly how the system works so that they trust the rates good enough to sell them to their consumers. Equally vital is creating a clear set of communications to give a reasoning for the prices in order to highlight value, and then tailoring those arguments to the customer.

Intensive arrangement training is also vital for providing offers reps the self-confidence and tools to make prodding arguments when talking to members. The best leaders accompany offers representatives to the most difficult members and focus on getting quick success so that offers reps develop the self-reliance to adopt the new rates method. “It was essential to show that management was behind this brand-new approach,” says Robert Krieger, handling supervisor of PanGas AG. “And we did this by joining brows through to hard customers. We were able to not just aid our sales reps but likewise show how the argumentation worked.”.

Definitely manage performance. To boost efficiency management, companies should assist the offers pressure with beneficial targets. The greatest influence originates from guaranteeing that the front line has a straightforward look at of profitability by client which the sales and advertising and marketing company has the ideal analytical capabilities to recognize and capitalize on the possibility. The offers pressure additionally has to be equipped to adjust costs itself rather than relying upon a central group. This needs a level of ingenuity in devising a customer-specific price approach, and also a business way of thinking. Rewards may also need to be transformed along with prices plans and efficiency measurements.

We’ve viewed firms in industries as diverse as software, chemicals, construction products and telecommunications achieve outstanding results using big data to educate much better rates choices. All had substantial varieties of SKUs and transactions, as well as a fragmented collection of customers; all viewed a profit-margin lift of in between 3 and 8 percent from establishing costs at a lot more granular item degrees. In one case, a European building-materials firm established costs that increased margins by up to 20 percent for selected items. To get the rate right, firms need to take advantage of big information and invest enough resources in sustaining their offers reps– or they might discover themselves paying the higher cost of lost profits.

IBM’s new SaaS service on SoftLayer

Passive Income Systems

IBM’s new SaaS service on SoftLayer offers data management in the cloud


unique - Edited



IBM Corp. launched Monday a range of new cloud services designed for the enterprise that are based on SoftLayer infrastructure. A year after its US$2 billion acquisition, SoftLayer has become the driving force behind IBM’s rapid acceleration to cloud leadership.


With big data creating demand for cloud, SoftLayer will play a key role in delivering IBM’s data and analytics portfolio to clients faster, more effectively and efficiently.

IBM’s cloud revenue went up more than 50 percent in IBM’s first quarter. For cloud delivered as a service, first-quarter annual run rate of $2.3 billion doubled year to year.

IBM’s CEO Ginni Rometty said in April that the company had continued to take actions to transform parts of the business, and to shift aggressively to its strategic growth areas, including cloud, big data analytics, social, mobile and security.

“As we move through 2014, we will begin to see the benefits from these actions,” Rometty said. Over the long term, they will position us to drive growth and higher value for our clients.”

IBM will make available via the Bluemix developer platform and IBM marketplace the Watson Engagement Advisor on SoftLayer which allows organizations to gain timely and actionable insights from big data, transforming the client experience through natural conversational interactions with system that get smarter with use.

Running on IBM’s POWER8 processor, IBM Power Systems integrated into SoftLayer’s infrastructure will handle big data, analytics and cognitive requirements in the cloud with unprecedented speed.

Watson Developer Cloud on Softlayer allows access for third party developers, entrepreneurs, academics, and system integrators looking to harness Watson’s cognitive capabilities in the products and services they bring to market.

IBM is now providing over 300 services within the IBM cloud marketplace that is based on SoftLayer. This includes data and analytics and SoftLayer offerings such as the IBM multi-enterprise Relationship Management SaaS that connects and manages shared business processes across a variety of communities, Time Series Database that connects applications to the Internet of Things and Analytics Warehouse which provides an agile platform for data warehousing and analytics.

Aspera high-speed transfer technology is now also available on SoftLayer which allows users to move large unstructured and structured data with maximum speed and security, regardless of data size, distance or network conditions.

IBM also unveiled new software defined storage-as-a-service on IBM SoftLayer, code named Elastic Storage on Cloud, to give organizations access to a fully-supported, ready-to-run storage environment, which includes SoftLayer bare metal resources and high performance data management and allows organizations to move data between their on-premise infrastructure and the cloud.

Elastic Storage on Cloud is optimized for technical computing and analytics workloads, providing more storage capability in a more cost-effective way. Organizations can now easily meet sudden spikes in storage demands without needing to purchase or manage in-house infrastructure.

With on-demand access to Elastic Storage resources, organizations working on high performance computing and analytics such as seismic data processing, credit risk management and financial analysis, weather modeling, genomics and scientific research are able to quickly adapt to changing business needs and get their products or research out of the door faster.

Elastic Storage on Cloud is available starting Tuesday. With pricing starting at $13,735 per 100 TB per month and includes software licenses, SoftLayer infrastructure and full support.

SoftLayer also expanded hourly billing for bare-metal servers bringing critical pay-as-you-go benefits of virtual server consumption to dedicated resources. Bare metal servers provide the increased performance and privacy that many enterprises desire.

IBM Cloud Modular management is a fully automated service management system to help companies to govern new cloud application environments. IBM Cloud Modular management enables companies with the choice and flexibility to pick the services they want to manage on their own or have IBM manage for them.

Jumpgate from SoftLayer will also play a key role in helping businesses build their own hybrid cloud environments. Jumpgate allows for interoperability between clouds by providing compatibility between the OpenStack API and a provider’s proprietary API.

Reasons DevOps Is Critical New Information Centers Tools -

Passive Income Systems

Why DevOps Is Coming to be a Critical Consider New Information Centers -




Practitioner Idea: Deal with Other Groups and Discover Ways to Build Sympathy
Structure bridges in between groups will raise your understanding of the challenges at every factor in the life cycle. As a designer, attempt to put on your own in the footwears of the operations team: Exactly how will they keep an eye on and deploy your software? As an ops individual, think of the best ways to help developers acquire responses on whether their software application will certainly work in production.



Supervisor Tip: Build Rely on With Your Equivalents on Various other Teams
Building trust in between groups is the most essential point you could do, and it must be developed in time. Count on is built on kept guarantees, open communication and acting naturally even in stressful scenarios. Your groups will certainly have the ability to work better, and the partnership will signify to the organization that cross-functional collaboration is valued.
DevOps is a software program development technique that makes use of automation to concentrate on communication, collaboration and combination between software developers and IT procedures experts. The objective is to maximize the of a routine, effectiveness, security and maintainability of functional procedures. Analyzing this fad, Puppet Labs has launched its 2014 State of DevOps report, which includes a 9,200-respondent study. The study revealed that high-performing IT divisions not only offer a clear competitive benefit, but that respondents in the “higher performing” group reported that their organizations are two times as most likely to go beyond earnings, market share and productivity goals. The record also located that for the 2nd successive year high-performing IT organizations deploy code 30 times a lot more often with 50 percent fewer failings. With so much using on the success and failing of IT, several in the career are searching for ways to improve process in order to operate at peak levels.

 New Information Centers


Specialist Pointer: Make Invisible Work Visible
Tape just what you and your coworkers do to assist cross-functional cooperation. If members of the dev and ops groups interact to address a trouble in the development setting, make certain to record and recognize what made that feasible: an ops colleague taking an extra on-call shift, or an assistant ordering meals for a functioning session. These are nontrivial contributions and may be needed for effective collaboration.

New Information Centers

Making use of Big Information to Make Better Rates Choices

Passive Income Systems

Harnessing the flood of information offered from customer communications permits firms to price properly– and experience the rewards.

It’s tough to overemphasize the value of acquiring rates right. Usually, a 1 percent cost rise equates into an 8.7 percent increase in operating revenues (presuming no loss of quantity, certainly). Yet we approximate that up to 30 percent of the thousands of pricing decisions companies make annually fall short to provide the very best cost. That’s a lot of shed earnings. And it’s specifically uncomfortable thinking about that the flood of information now available offers firms with an opportunity to make dramatically much better pricing choices. For those able to introduce order to huge data’s complexity, the value is significant.

We’re not recommending it’s simple: the variety of consumer touchpoints keepings blowing up as digitization fuels growing multichannel complexity. Yet cost points should keep pace. Without uncovering and acting on the possibilities big information presents, many firms are leaving countless dollars of profit on the table. The technique to raising profit margins is to utilize big data to discover the most effective rate at the product– not classification– degree, instead of drown in the numbers flood.

Also Huge to Be successful
For each item, companies should be able to discover the superior price that a client is willing to pay. Essentially, they ‘d factor in highly specific ideas that would influence the price– the price of the next-best competitive item vs. the value of the item to the customer, as an example– and afterwards arrive at the very best price. Without a doubt, for a company with a handful of products, this sort of pricing technique is simple.

prices strategiesIt’s additional bothersome when product numbers balloon. About 75 percent of a regular business’s revenue originates from its typical items, which usually number in the many thousands. Lengthy, hand-operated practices for establishing prices make it practically impossible to see the prices designs that can unlock value. It’s simply also overwhelming for huge business to obtain granular and take care of the complexity of these pricing variables, which transform constantly, for countless products. At its core, this is a large data concern.

Several marketing experts end up merely burying their heads in the sand. They develop prices based on simplified elements such as the cost to generate the product, standard margins, rates for similar products, quantity markdowns and so on. They draw on aged techniques to manage the items as they consistently have or cite “market prices” as a justification for not attacking the concerns. Maybe worst of all, they rely on “attempted and examined” historic approaches, such as an universal 10 percent rate trip on every little thing.

“Just what happened in technique then was that yearly we had actually cost increases based on scale and volume, but not based on science,” says Roger Britschgi, head of sales operations at Linde Gases. “Our folks merely really did not think it was feasible to do it otherwise. And, quite honestly, our folks were not well ready to persuade our clients of the should enhance prices.”.

Four Pointers to Turn Data into Profits.
The trick to far better rates is comprehending fully the data now at a firm’s disposal. It needs not zooming out yet focusing. As Tom O’Brien, group vice president and basic manager for advertising and marketing and offers at Sasol, said of this technique, “The [offers] teams understood their rates, they could have known their quantities, but this was something more: exceptionally granular data, actually from each and every statement, by product, by customer, by product packaging.”.

As a matter of fact, a few of the most exciting examples of making use of large information in a B2B context actually transcend pricing and discuss various other components of a business’s business engine. For instance, “dynamic deal rating” provides rate advice at the degree of individual deals, decision-escalation factors, incentives, efficiency scoring, and more, based upon a set of comparable win/loss offers. Utilizing much smaller, relevant offer examples is vital, as the factors tied to any one deal will certainly differ, making an overarching collection of bargains ineffective as a benchmark. We have actually seen this used in the modern technology industry with fantastic success– producing increases of 4 to eight percentage factors in return on offers (versus same-company control groups).

To get adequately granular, business have to do four factors.

Hear the information. Establishing the most effective prices is not an information difficulty (companies typically already sit on a treasure trove of information); it’s an evaluation obstacle. The best B2C companies recognize ways to translate and act upon the wide range of information they have, yet B2B firms tend to manage data instead of utilize it to drive decisions. Great analytics can help business determine just how factors that are typically overlooked– such as the broader financial circumstance, item choices and sales-representative settlements– disclose just what drives prices for every consumer segment and item.

Automate. It’s as well pricey and taxing to examine hundreds of products manually. Automated systems can determine slim sections, determine what drives value for each one and match that with historic transactional data. This allows companies to establish rates for collections of products and sectors based upon data. Automation also makes it much easier to replicate and tweak analyses so it’s not essential to go back to square one whenever.

Construct capabilities and self-reliance. Executing new costs is as much a communications difficulty as an operational one. Effective business overinvest in considerate adjustment programs to assist their offers pressures recognize and accept brand-new prices approaches. Firms should function closely with offers representatives to explain the reasons for the price referrals and how the system works to ensure that they rely on the prices good enough to sell them to their customers. Equally vital is creating a clear set of interactions to supply a purpose for the prices in order to highlight value, then modifying those arguments to the customer.

Extensive agreement training is likewise critical for providing sales representatives the self-reliance and tools to make prodding debates when talking to clients. The very best leaders come with sales representatives to the most difficult customers and focus on acquiring fast wins to ensure that sales reps establish the self-reliance to adopt the brand-new pricing technique. “It was important to reveal that management was behind this brand-new method,” mentions Robert Krieger, managing director of PanGas AG. “And we did this by signing up with brows through to hard customers. We managed to not simply aid our offers reps however additionally demonstrate how the argumentation worked.”.

Actively manage efficiency. To boost performance administration, firms need to sustain the sales pressure with helpful targets. The best influence originates from making sure that the front line has a transparent look at of productivity by customer which the offers and advertising and marketing company has the best analytical capabilities to recognize and take advantage of the chance. The offers force also has to be equipped to adjust prices itself as opposed to counting on a central team. This calls for a degree of creativity in designing a customer-specific rate strategy, as well as an entrepreneurial mind-set. Incentives could likewise should be altered along with pricing plans and performance measurements.

We have actually viewed business in industries as diverse as software program, chemicals, construction materials and telecoms obtain impressive results by making using of huge data to notify much better prices choices. All had massive varieties of SKUs and purchases, along with a fragmented profile of clients; all saw a profit-margin lift of between 3 and 8 percent from establishing costs at a lot more granular item levels. In one case, a European building-materials business established costs that enhanced margins by around 20 percent for picked products. To obtain the rate right, companies must make the most of huge data and spend sufficient sources in sustaining their offers representatives– or they could find themselves paying the high rate of shed earnings.

Hot Big Information Trends database and data-management technologies.

Passive Income Systems

Right here are the top-ten big data trends:.

Hadoop is becoming the underpinning for dispersed huge data administration. Hadoop is a dispersed data system that can be used combined with MapReduce to refine and evaluate enormous amounts of data, allowing the large data trend. Hadoop will certainly be securely integrated into data warehousing innovations so that structured and unstructured information can be incorporated more effectively.

Large information makes it feasible to leverage information from sensors to alter business end results. A growing number of companies are making use of highly advanced sensing units on the equipment that runs their operations. New advancements in huge data modern technology are making it possible to examine all this data to get progressed notice of issues that can be taken care of to secure business.

Huge data can help a company effort become a real-time action to increase revenue.Com panies in markets such as retail are using real-time streaming data analytics to monitor client actions and deal motivations to improve revenue per consumer.

Big information could be incorporated with historical information warehouses to change planning. Huge information could give a business with a much better understanding of enormous amounts of information about their business. This details about the present state of the business can be integrated with historical data to obtain a full look at of the context for business change.

Large information could alter the means diseases are handled by adding anticipating analytics. Increasingly, healthcare specialists are aiming to huge data options to get understandings into disease by contrast symptoms and examination results to data sources of arise from hundreds of countless various other situations. This enables practitioners to faster forecast end results and save lives.

Cloud computer will certainly transform the method that information will certainly be handled in the future. Cloud computing is important as a tool to assist the development of huge data. Increasingly, cloud support services that are optimized for data will imply that many more services and distribution models will certainly make huge data more functional for firms of all sizes.

Safety and governance will be the distinction between success and failure of businesses leveraging huge information. Big data can be a significant advantage, yet it isn’t safe. Companies will certainly discover that if they are not careful, it is possible to subject personal details through huge information evaluation. Business need to balance the should examine results with best practices for safety and administration.

Honesty, or truthfulness, of large information will end up being the most vital issue for the coming year. Lots of firms could acquire carried away with the ability to examine enormous quantities of information and get back engaging outcomes that forecast business results. For that reason, business will certainly discover that the truthfulness of the information must become a top concern or choice production will endure.

As huge data vacates the experimental stage, even more packaged offerings will be developed. The majority of big data projects initiated over the previous few years have been experimental. Firms are carefully working with brand-new tools and technology. Now large information will get in the mainstream. Great deals of packaged huge information providings will certainly flood the market.

Use cases and brand-new cutting-edge methods to use huge information will certainly blow up. Early successes with large information in various sectors such as production, retail, and medical care will bring about a lot more sectors looking at ways to take advantage of massive quantities of information to transform their markets.

10 Hot Big Data Trends.

Hadoop 101

Passive Income Systems


Hadoop 101: Programming MapReduce with Native Libraries, Hive, Pig, and Cascading

June 06, 2013 • PRODUCTS • By Stacey Schneider

hadoop-101-programming-basics_V02Apache Hadoop and all its flavors of distributions are the hottest technologies on the market. Its fundamentally changing how we store, use and share data. It is pushing us all forward in many ways–how we socialize with friends, how science is zeroing in on new discoveries, and how industry is becoming more efficient.

But it is a major mind shift.  I’ve had several conversations in the past two weeks with programmers and DBAs alike explaining these concepts. For those that have not yet experimented with it, they find the basic concepts of breaking apart databases and not using SQL to be equal parts confusing and interesting science. To that end, we’re going to take this conversation a little more broad and start to layout some of the primary concepts that new professionals to Hadoop can use as a primer.

To do this, examples work best. So we are going to use a basic word count program to illustrate how programming works within the MapReduce framework in Hadoop. We will explore four coding approaches using the native Hadoop library, or alternate libraries such as PigHive and Cascading so programmers can evaluate which approach works best for their needs and skills.

Basic Programming in MapReduce

In concept, the function of MapReduce is not some new method of computing. We are still dealing with data input and output. If you know basic batch processing, MapReduce is familiar ground—we collect data, perform some function on it, and put it somewhere. The difference with MapReduce is that the steps are a little different, and we perform the steps on terabytes of data across 1000s of computers in parallel.

The typical introductory program or ‘Hello World’ for Hadoop is a word count program. Word count programs or functions do a few things: 1) look at a file with words in it, 2) determine what words are contained in the file, and 3) count how many times each word shows up and potentially rank or sort the results. For example, you could run a word count function on a 200 page book about software programming to see how many times the word “code” showed up and what other words were more or less common. A word count program like this is considered to be a simple program.

The word counting problem becomes more complex when we want it to run a word count function on 100,000 books, 100 million web pages, or many terabytes of data instead of a single file. For this volume of data, we need a framework like MapReduce to help us by applying the principle of divide and conquer—MapReduce basically takes each chapter of each book, gives it to a different machine to count, and then aggregates the results on another set of machines. The MapReduce workflow for such a word count function would follow the steps as shown in the diagram below:

  1. The system takes input from a file system and splits it up across separate Map nodes
  2. The Map function or code is run and generates an output for each Map node—in the word count function, every word is listed and grouped by word per node
  3. This output represents a set of intermediate key-value pairs that are moved to Reduce nodes as input
  4. The Reduce function or code is run and generates an output for each Reduce node—in the word count example, the reduce function sums the number of times a group of words or keys occurs
  5. The system takes the outputs from each node to aggregate a final view

So, where do we start programming?

There are really a few places we might start coding, but it depends on the scope of your system. It may be that you need to have a program that places data on the file system as input or removes it; however, data can also move manually. The main area we will start programming for is the colored Map and Reduce functions in the diagram above.

Of course, we must understand more about how storage and network are used as well as how data is split up, moved, and aggregated to ensure the entire unit of work functions and performs as we expect. These topics will be saved for future posts or you can dig into them on your own for now.

Code Examples—Hadoop, Pig, Hive, and Cascading

At a high level, people use the native Hadoop libraries to achieve the greatest performance and have the most fine-grained control. Pig is somewhere between the very SQL-like, database language provided by Hive and the very Java-like programming language provided by Cascading. Below, we walk through these four approaches.

Let’s look at the 4 options.

Native Hadoop Libraries

The native libraries provide developers with the most granularity of coding. Given that all other approaches are essentially abstractions, this language offers the least overhead and best performance. Most Hadoop queries are not singular, rather they are several queries strung together. For our simplistic example with a single query, it is likely the most efficient. However, once you have more complex series of jobs with dependencies, some of the abstractions offer more developer assistance.

In the example below, we see snippets from the standard word count example in Hadoop’s documentation. There are two basic things happening. One, the Mapper looks at a data set and reads it line by line. Then, the Mapper’s StringTokenizer function splits each line into words as key value pairs—this is what generates the intermediate output. To clarify, there is a key value pair for each instance of each word in the input file. At the bottom, we can see that the reducer code has received the key value pairs, counts each instance, and writes the information to disk.

Apache Pig

Apache Pig’s programming language, referred to as Pig Latin, provides a higher level of abstraction for MapReduce programming that is similar to SQL, but it is procedural code not declarative. It can be extended with User Defined Functions (UDFs) written in Java, Python, JavaScript, Ruby, or Groovy  and includes tools for data execution, manipulation, and storage. In the example below, we can see Pig Latin used to describe the same word count application as above, but fewer lines of code are used to read, tokenize, filter, and count the data.

Apache Hive

Hive is a project that Facebook started in 2008 to make Hadoop behave more like a traditional data warehouse. Hive provides an even more SQL-like interface for MapReduce programming. In the example below, we see how Hive gets data from Hadoop Distributed File System (HDFS), creates a table for lines then does a select count on the table in a very SQL-like fashion. The lateral view applies splits, eliminates spaces, groups, and counts. Each of these commands maps to MapReduce functions covered above. Often considered the slower of the languages to do Hadoop with, this project is being actively worked on to speed it up 100x.


Cascading is neither a scripting nor a SQL-oriented language—it is a set of .jars that define data processing APIs, integration APIs, as well as a process planner and scheduler. As an abstraction of MapReduce, it may run slower than native Hadoop because of some overhead, but most developers don’t mind because its functions help complete projects faster, with less wasted time. For example, Cascading has a fail-fast planner which prevents it from running a Cascading Flow on the cluster if all the data/field dependencies are not satisfied in the Flow. It defines components and actions, sources, and output. As data goes from source to output, you apply a transformation, and we see an example of this below where lines, words, and counts are created and written to disk.

Just the Tip of the Iceberg

For Hadoop, there are many ways to skin this cat. These four examples are considered the more classic or standard platforms for writing MapReduce programs, probably because all except for Cascading is an Apache project. However, many more exist. Even Pivotal has one with our Pivotal HD Hadoop distribution called HAWQ. HAWQ is a true SQL engine that appeals to data scientists because of the level of familiarity and the amount of flexibility it offers. Also, it is fast. HAWQ can leverage local disk, rather than HDFS, for temporarily storing intermediate result, so it is able to perform joins, sorts and OLAP operations on data well beyond the total size of memory in the cluster.

Additional reading:





Passive Income Systems




Apache Hadoop has actually become the dominant platform for Big Data analytics in the last few years, thanks to its flexibility, integrity, scalability, and ability to match the requirements of developers, web startups, and business IT. A rapid and economic means to leverage the huge quantities of data created by new sources such as social networking sites, mobile sensing units, social media, and Internet of Ordeals devices, Hadoop has actually ended up being the favored system for storage space and analytics of huge unstructured datasets.

Originally established in 2003 by data scientists at Yahoo!, Hadoop was quickly welcomed by the open source community, in addition to consumer-facing Internet giants such as Google and Facebook. Recently, Hadoop has actually been embraced by ventures that similarly need to obtain actionable insight from Big Data created by brand-new data sources, technology innovations, cloud support services, and business opportunities. IDC has actually predicted the Hadoop software market will be worth $813 million by 2016.

Hadoop is a game changer for enterprises, transforming the economics of massive information analytics. It eliminates information silos, and lessens the need to move information in between storage space and analytics software program, giving businesses with a much more all natural sight of their clients and operations, causing quicker and a lot more effective company ideas. Its extensibility and countless combinations can power a new generation of data-aware business applications.

The software application’s “refreshingly distinct approach to information administration is transforming exactly how firms save, process, evaluate and share large data,” according to Forrester analyst Mike Gualtieri. “Forrester believes that Hadoop will end up being essential facilities for huge business.”.

For companies utilizing proprietary information solutions and personnel familiar with SQL analytics tools, transitioning to Hadoop could be difficult, in spite of its several advantages. Combination with existing infrastructure could provide a significant difficulty. To this end, Critical supplies its enterprise-grade Hadoop distribution Pivotal HD as either a standalone product or part of the Pivotal Big Data Collection.

Crucial HD builds on Hadoop’s solid foundation by including features that boost business adoption and usage of the platform. It allows business Data Lake, permitting companies to introduce their existing analytics tools to their data. Essential HD is the Foundation for business Data Lake supplying the World’s Many Advanced Real-Time Analytics System with GemFire XD, and the most extensive set of Advanced Analytical Toolsets with HAWQ, MADlib, OpenMPI, GraphLab or even Spring XD. Showcasing HAWQ, the world’s fastest SQL query engine on Hadoop, Pivotal HD speeds up information analytics houses, leverages alreadying existing skillsets, and significantly broadens Hadoop’s capacities. Crucial GemFire brings live analytics to Hadoop, allowing companies to procedure and make essential company choices instantly.

While leveraging Hadoop’s tested perks, Essential HD includes features that ease adoption, boost efficiency, and offer robust administration tools. It sustains leading information science devices such as MADlib, GraphLab (OpenMPI), and User-Defined Functions, including support for prominent languages such as R, Java, and Python. Essential HD additionally integrates with Spring season environment buildings such as Springtime XD, alleviating the advancement of data-driven applications and solutions.

Allowing companies to collect and take advantage of both organized and disorganized data kinds, Pivotal HD makes it possible for a flexible, fault-tolerant, and scalable Business Information Lake. Pivotal’s engineers, several of which were indispensable to Hadoop’s development and development, have actually developed an enterprise-grade Hadoop circulation. Find out more regarding their proceeded work on Pivotal HD on the Critical blog site.



Books and other resources to learn R

Passive Income Systems

From Amy’s Page

12 Books and other resources to learn R

This article was originally posted on UCAnalytics. Link to full version is provided at the bottom.

1. R for Reference

r for everyoneR for Everyone: Advanced Analytics and Graphics – Jared P. Lander

YOU CANalytics Book Rating 5 Stars (5 / 5)

Jared Lander, in his book, wastes no time on basic graphic (comes pre-installed with R), but jumps directly to ggplot2 package (a much advanced and sleek graphical package). This sets the tone for this book i.e. don’t learn things you won’t use in real life applications later. I will highly recommend this book for a fast paced experience to learn R.

R in Action

R in Action - Robert Kabacoff

YOU CANalytics Book Rating 5 Stars (5 / 5)

Here is another exceptional book to start learning R on your own. I must say Robest Kabacoff, the author of this book, has done a phenomenal job with this book. The organization of the book is immaculate and the presentation is friendly. I will highly recommend either this book or R for Everyone to start your journey to learn R.

The r bookThe R Book Michael J. Crawley

YOU CANalytics Book Rating 4.8 Stars (4.8 / 5)

With close to a thousand pages and vast coverage, ‘The R Book’ could be called the Bible for R.  This book starts with simple concepts in R and gradually move to highly advanced topics. The breadth of the book can be estimated through the presence of dedicated chapters on topics as diverse as data-frames, graphics, Bayesian statistics, and survival analysis. Essentially this is a must have reference book for any wannabe R programmer. But for a beginner the thickness of the book could be intimidating.

2. R with Theory

R StatsAn Introduction to Statistical Learning: with Applications in R - Gareth James et al.

YOU CANalytics Book Rating 5 Stars (5 / 5)

This book is a high quality statistical text with R as the software of choice. If you want to be comfortable with fundamental concepts in parallel with learning R, then this is the book for you. Having said this, you will love this book even if you have studied advanced statistics. The book also covers some advanced machine learning concepts such as support machine learning (SVM) and regularization. A great book by all means.

machine learning with RMachine Learning with R Brett Lantz

YOU CANalytics Book Rating 4.5 Stars (4.5 / 5)

If you want to learn R from the machine learning perspective, then this is the book for you. Some people take a lot of interest in fine demarcation between statistics and machine learning; however for me there is too much overlap between the topics. I have given up on the distinction as it makes no difference from the applications perspective. The book introduces R-Weka package – Weka is another open source software used extensively in academic research.

3. R with Applications

 r and data miningR and Data Mining: Examples and Case Studies – Yanchang Zhao

YOU CANalytics Book Rating 4.3 Stars (4.3 / 5)

There are other books that use case studies approach for readers to learn R. I like this book because of the interesting topics this book covers including text mining, social network analysis and time series modeling. Having said this, the author could have put in some effort on formatting of this book which is pure ugly. At times you will feel you are reading a masters level project report while skimming through the book. However, once you get over this aspect the content is really good to learn R.

R rattleData Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) - Graham Williams

YOU CANalytics Book Rating 4.2 Stars (4.2 / 5)

Rattle is no SAS E-miner or SPSS modeler (both commercial GUI based data mining tools). However trust me, apart from a few minor issues Rattle is not at all bad. The book is a great reference to Rattle (a GUI add on package for R to mine data) for data mining. I really hope they keep working on Rattle to make it better as it has a lot of potential.

 4. R Graphics and Programming

GGplot2ggplot2: Elegant Graphics for Data Analysis (Use R!) – Hadley Wickham

YOU CANalytics Book Rating 4 Stars (4 / 5)

‘ggplot 2′ is an exceptional package to create wonderful graphics on R. It is much better than the base graphics that comes pre-installed with R, so I would recommend you start directly with ggplot 2 without wasting your time on base graphics. ‘R for everyone’, the first book we discussed, has a good introduction to ggplot. However, if you want to get to further depths of ggplot-2 then this is the book for you.

Though I prefer ggplot 2, Lattice is another package at par with ggplot 2. A good book to start with Lattice is ‘Lattice: Multivariate Data Visualization with R (Use R!) by Deepayan Sarkar’.

Read full list.

Additional links

Emerging Storage

Passive Income Systems

Emerging Storage, VMware And Pivotal Drive EMC’s Q2 Earnings

Trefis TeamTrefis Team , Contributor

EMC announced its second quarter earnings on July 23, reporting a 5% year-on-year growth in net revenues to $5.9 billion. The company’s services revenues rose by almost 9% over the prior year quarter to $2.6 billion while its product revenues stayed flat at about $3.3 billion. Much of the growth was driven by VMware (+17%), Pivotal (+28%) and RSA Security (7%) while core information storage revenues remained nearly flat at $4 billion.

EMC’s market share in external storage systemsdeclined from 30.2% in Q1 2013 to 29.1% in the first quarter of 2014, according to a recent report by IDC. This was the first quarter since 2008 in which EMC’s market share declined year-over-year. EMC’s revenues from external storage systems in Q1 declined by almost 9% while the industry-wide decline was about 5%. However, EMC’s revenues in Q2 grew higher than the industry average, due to which the company gained share in the market.

Weakness in its core business led to market speculation prior to earnings about EMC spinning off VMware and Pivotal. The Wall Street Journal reported that external pressure from EMC’s large institutional investors could lead the company to spin off some of the fastest-growing businesses within the company such as VMware and Pivotal. However, EMC’s management refuted the speculation and stood by its “federation” business model, wherein some of the acquired companies operate as separate entities while they still collaborate on products for large clients. The company believes that its current setup is ideal for growth for both EMC and the acquired companies.

We have a $30 price estimate for EMC, which is roughly in line with the current market price.

See our full analysis for EMC’s stock

Key Areas Of Growth:

Emerging Storage

EMC’s Emerging Storage products such as XtremIO, Isilon, Atmos and VPLEX were largely responsible for the growth in hardware sales during the past few quarters. The Emerging Storage sub-segment grew by 51% year-over-year (y-o-y) in Q1 2014, which the company attributed to a strong customer response for these products. Despite strong y-o-y growth, the revenues generated by emerging storage solutions stayed flat over Q1. The company attributed this to intermittent demand for some large individual orders. The company expects strong growth for emerging storage solutions on the back of solid demand for software-defined storage, Big Data analytics, cloud storage and flash arrays in the coming quarters.


VMware’s revenues grew by 17% y-o-y to $1.45 billion for the June quarter with growth coming from both product licenses revenues (+16%) and services revenues (+18%). However, VMware’s gross margin within EMC declined by 180 basis points over the prior year quarter to 87.8%. The decline in VMware’s margins led EMC’s overall gross margin to decline by 40 basis points to 62.1%. EMC has invested over $6 billion in acquisitions and internal developments since 2012, of which a significant portion was attributable to VMware related products. These acquisitions included software-defined networking leader Nicira and mobility management leader AirWatch. All the acquisitions will show up as losses on the income statement this year. However, management believes that margins are likely to improve in the future quarters (read: SDN, Hybrid Clouds And AirWatch Help VMware Post Strong Q2 Results).


Pivotal is among the fastest-growing divisions within the company, with 40% y-o-y growth in the first quarter. Although the growth rate was lower than the previous quarter at 29%, the number of orders rose by over 50%. Additionally, Pivotal’s margins expanded from the March quarter. Pivotal’s platform consists of new generation data fabrics, application fabrics and a cloud-independent Platform-as-a-Service to support cloud computing and Big Data applications, which have started gaining traction among customers. Management mentioned that some of Pivotal’s growth may not be immediately realized in the numbers since it is building out a subscription-based revenue stream, which is likely to be beneficial in the long run.

RSA Security

RSA Security, EMC’s information security division, grew by over 11% to almost $1 billion in 2013. The growth continued in the first half of 2014, but the rate of growth was lower than 2013 at about 6% y-o-y. The information security industry is growing, with customers allocating more of their security budgets to intelligence-driven analytics, where RSA Information Security excels, rather than static prevention.



Unique Traffic Generation Wordpress SEO Plugin by SEOPressor