The @WalmartLabs Blog

Read about how we’re innovating the way our customers shop.

  1. Delivering Exceptional Product Content: Solving a Universal Problem

    Posted on by Bao Nguyen

    Written by Ram Rampalli, global head of content acquisition

    The Internet and social media have reshaped the retail landscape, transforming it from a static, linear world to a dynamic, networked world. Whereas local brick & mortar stores were once the center of the retail universe, more and more of today’s consumers enjoy the convenience of shopping anytime, anywhere in stores, online or via mobile devices. Traditional media, social media and the Internet are the new knowledge gatekeepers and increasingly influence consumer-purchasing behaviors. However influential they may be, social product ratings are clearly subjective, and online product information is not always accurate and lacks consistency from site to site.

    Since today’s connected consumers are more likely to comparison shop, this lack of consistent and credible information often leads to a frustrating and perhaps more lengthy shopping experience than desired. We’ve all been there! This is why Walmart has implemented a pilot program called the Product Content Collection System (PCCS) that will facilitate the supplier sending their catalog directly to us. The crux of the PCCS program is a specification (a set of instructions including the list of attributes, requirement level, data transfer protocol) that will be provided to the supplier to use as a framework for sharing this data with Walmart. We are encouraging suppliers to provide us with product content for their entire catalog, whether or not it is currently carried within the Walmart retail eco-system. We also welcome content from suppliers who currently don’t sell on Walmart.

    Our goal is to provide the most accurate, up-to-date, and comprehensive product information to our customers – regardless of how they shop – to improve the shopping experience for our customers. We are not reinventing the wheel here. Many of our large suppliers are already part of the B2B centric Global Data Sync Network (GDSN) spearheaded by GS1 and a number of retailers and suppliers. At the same time, the explosive growth of our Marketplace program has introduced smaller and mid-sized suppliers, sellers and DSVs to the Walmart ecosystem that are not part of the GDSN program. We want to make it as easy as possible for suppliers to upload their catalog to us — either through the GDSN, the PCCS or both. If the supplier already works with third-party content service providers like Salsify, Shotfarm, ARS and Kwikee, we are agnostic to the approach. As long as the data comes to us through the right channel, and is approved by the supplier, we can import it, and customers can use it/will benefit from it. Having the full product catalog ensures we have great content for the customer experience, and also allows for rapid set up of items if and when Walmart decides to carry them.

    We are expecting this program to be launched later this calendar year. In the meantime, if you would like to join the dozens of suppliers currently participating in the pilot and submitting their product catalogs to us, or get a sneak preview of the PCCS specification, please feel free to send us an email at contentacq@walmart.com.

  2. Why we chose OpenStack for Walmart Global eCommerce

    Posted on by Bao Nguyen

    Written by Amandeep Juneja, senior director of cloud operations and engineering, @WalmartLabs

    When people think of Walmart and the nitty gritty of how we do business, they usually think about merchandising, item placement, and inventory management—the hallmarks of running a global chain of retail stores. So Walmart’s decision to invest in cloud infrastructure might not be something you’d expect from a brick and mortar retailer with over $480B in revenue.

    So why did we decide to invest so heavily in the cloud?

    The answer is that Walmart has always relied on cutting-edge technology to fuel our growth. Walmart was a pioneer in opening up our inventory systems to vendors to reduce inventory costs and bring lower prices to our customers. We were the first company to connect our store network with satellite communication, long before the advent of Internet, enabling us to reach consumers who until then had no access to discount stores.

    We’ve always sought to expand fast, adapt to changing consumer preferences, and keep the costs of operations low.

    Walmart is growing fast, and Walmart Global eCommerce is leading the charge. Our customers want to use our eCommerce platform from many different access points, not only from their home computers but also from mobile phones, tablets, and kiosks within Walmart retail stores—always expecting a seamless experience.

    With such rapid growth, we needed a technology stack that would scale to meet the explosive demand, flexible enough to build applications that adapt to ever-changing user preferences, and with enough big data smarts to predict what customers want and provide them with recommendations.

    For traditional businesses, growing in size means that economies of scale kick in, leading to lower per-unit costs. On the retail side, Walmart has always enjoyed these cost savings, which we’ve passed down to our customers in the form of our trademark “everyday low prices.”

    But when it comes to technology, things aren’t so simple. As a company’s technology footprint increases, the expansion can lead to “diseconomies of scale,” meaning the cost per transaction actually goes up. The cost of doing business goes up, and having more users is actually bad for business.

    Such an inversion of the cost curve can happen due to bad infrastructure architecture, when you find yourself locked into a certain system or application, where vertically scaling costs more than horizontal scaling. It can also be the result of bad application design. Maintaining and adding new features becomes a nightmare, increasing opportunity cost for businesses for delivering new products.

    That’s where cloud architecture comes in. Instead of expanding vertically by, say, buying big, powerful machines for ten times the cost, distributed computing means you can use large number of commodity machines to spread and gather — providing the same power, but at a fraction of the cost of traditional data centers and infrastructures.

    A second benefit of the cloud is that distributed architecture provides a higher degree of resilience and reliability. A single machine can go down, but the odds that ten machines go down at once are much lower.

    Last year, @WalmartLabs made a decision. In order to meet the challenges of eCommerce 3.0, we needed to overhaul our technology stack and the tech vision that goes with it. We decided to build an elastic cloud, running applications using a services-oriented architecture.

    We wanted to choose the platform that would best support our application developers, enabling them to rapidly build all kinds of applications, including mobile, WebApps, and RestFul APIs for vendors. A platform that would empower product managers to iterate over new product ideas in an agile manner. A platform that would enable Walmart to respond to customer needs more efficiently.

    We chose OpenStack as our cloud platform, not only because it’s best of the breed, but also because open source software comes with several big advantages:

    • Using open source means we avoid long-term lock-ins with any single private vendor.
    • More importantly, we know that Walmart Global eCommerce is growing into something unique. Using open source means we can modify and customize software to meet our needs.
    • Finally, OpenStack has a true community around it. It’s been used and supported by market leaders all over the Bay Area. Walmart wants to be part of that community. We have a team of very talented developers, and we plan to contribute aggressively to the open source community.

    In the nine months, since we started building OpenStack cloud, we’ve already built an OpenStack Compute layer with 100K cores and counting. Our next step is to bring in more block storage and venture into software-defined networks using OpenStack projects such as Neutron and Cinder. We’re currently building a multi-petabyte object storage using Swift.

    A lot of people use OpenStack, but what makes Walmart’s OpenStack project so exciting is the scale of our investment. Over 140 million customers shop our stores and online in the US every week. Unlike other large installations, we’re using the OpenStack platform for real production loads. By the holidays last year, Walmart.com’s entire U.S. production traffic was on OpenStack compute.

    This is an incredibly exciting time. Around the world, eCommerce will only continue to grow, and Walmart is lucky to have the opportunity to contribute to the technologies that will make it happen.

  3. Data Science in Search @Labs: an Interview with Dr. Manas Pathak

    Posted on by Bao Nguyen

    Data science is fast becoming one of the most ubiquitous and powerful fields today. Sought after across multiple industries to generate value from the growing preponderance of data, data scientists have become not only a valuable asset, but also a critical part of the team that any company must build and retain. @WalmartLabs is no exception to this trend; for Polaris, the @Labs team that builds the search engine and discovery experience powering Walmart.com, data science is an integral part of everything that we do – from building ranking algorithms and new features to assortment analytics and user behavior analysis. Dr. Manas Pathak, a staff software engineer on our Search Relevance team, is a testament to this and is responsible for several core features used by over a billion people to search and connect to products. Using his experiences here @Labs, Dr. Pathak recently authored a book, Beginning Data Science with R. He also took some time to answer some questions about his work here, his thoughts on data science, and what inspired him to write a book.

    You work on complex data science problems here in search at @WalmartLabs. Can you give us a brief overview some of your work here and what kind of problems you’re solving? 

    @WalmartLabs is one of the best places to do data science anywhere in the world. The biggest opportunity here is due to the gigantic amount of data. Even small insights obtained from this data often lead to huge business impact in absolute terms. This is especially true for the search team, where we use this data to build and optimize algorithms that help users find the products they are looking for.

    I have contributed to multiple core search relevance features including click engagement modeling, where we build a statistical model to learn from the past customer behavior to improve the ranking of search results. This feature is a major component of the search ranking algorithm powering Walmart.com and other eCommerce websites and has led to significant improvement in site-wide conversion rates and revenue. Another set of features I have worked on is left hand navigation, where we determine the important attributes: categories and facets for a given search query. I have created multiple models to rank these attributes in the most relevant order.

    To many people who have just heard of the term “data science”, it seems very much a new field, although we can perhaps more accurately describe it as an extension of statistics and computer science. It certainly has jumped into the limelight and its popularity seems to continue to reach new heights. How do you see data science evolving? What about the role and skills of the data scientist?

    Fundamentally, data science is the methodology of extracting useful insights from the data. More than being an extension of any single discipline, data science is at the intersection of programming, statistics, and domain knowledge. For the most part, data science is not entirely new; techniques in these areas have been practiced for decades under different names. Only in the last few years, we are seeing a set of these techniques fall under the area of data science. As a part of the evolution of data science, I continue to foresee the techniques being standardized both in terms of techniques and tools. I also foresee R being the dominant tool for data science with its vast package system providing open source implementations for most data analysis techniques.

    With the growing popularity of data science, there is an acute shortage of skilled data science professionals in industry. An important area of standardization is with respect to the training of data scientists. Currently most data scientists have a background in computer science, statistics, or one of the quantitative sciences such as Physics or Math. Currently, there are a few universities such as Columbia and UC Berkeley offering professional masters programs in Data Science. I foresee more universities offering a standardized data science curriculum both at undergraduate and graduate levels.

    As the popularity of data science has risen, the field has attracted newcomers from various disciplines who want to either understand data science or become data scientists themselves. Coming from a strong technical background and having spent time across various data science teams and companies, how do you recommend beginners to start learning data science?

    Being an applied discipline, the only effective way to learn data science is by doing. It is helpful for newcomers to get a good understanding of the different aspects of data science, but the biggest benefit comes from trying them out on their own datasets. For newcomers in industry, the best way to learn is to look for insights in the data from their business processes. Students can similarly analyze public datasets and also participate in open data analysis competitions such as the ones on Kaggle. A good starting point is to get hands on experience with data science and R through real world case studies.

    What was your main motivation in writing a book on data science?

    Over the years, I had gained a lot of experience applying data science on diverse datasets with R programming language, especially here in Search @WalmartLabs. I found most other books on R were either focusing on only the features of R programming language or a specialized application area. Either category of books are not the most useful for readers who do not already have a good understanding of the data science concepts. My motivation to write this book was to provide the intuitive understanding of data science as well as the steps to carry them out with R. The goal is to help readers quickly get started with their own data science problems.

    In Beginning Data Science with R, you cite your purpose of striking a balance between the “how” and the “why” of various data science techniques. I think this is a fantastic idea but comes with rather difficult execution; many introductory books tend to be either completely intuitive or highly practical. Can you speak to how you aimed to accomplish this sought-after balance in the book?

    Maintaining this balance was one of the main challenges in writing this book. In my experience, most books on R are either too hard for beginners or too shallow for intermediate or expert readers. I wanted this book to be accessible to readers without a background in data science, so I spent a lot of time covering the basics and introducing the R programming language.

    For every data science methodology, the book introduces the motivation and gives an intuitive understanding before directly diving into the technical detail. A great way to cover both “how” and “why” of data science together is through case studies. In this book, I included data analysis of a real world dataset with each chapter. Designing all of the features I worked on here at Search required applying many data science techniques that I have covered in my book.

    About Manas Pathak: Dr. Manas A. Pathak received a BTech degree in computer science from Visvesvaraya National Institute of Technology, Nagpur, India, in 2006, and MS and PhD degrees from the Language Technologies Institute at Carnegie Mellon University (CMU) in 2009 and 2012 respectively. His PhD thesis on “Privacy-Preserving Machine Learning for Speech Processing” was published as a monograph in the Springer best thesis series. His research received significant press coverage, including articles in the Economist and MIT Tech Review. He has many years of experience with data analysis using the R programming language. He is currently working as a staff software engineer in Search Relevance at @WalmartLabs’s Polaris team.