Tesla and Waymo's Big Data Management Competition

What is the difference between data and big data? Simple, data management in a computer system is when you have a database of information that is needed to perform various calculations. Big data is when the system has a lot of data from various sources and is being updated constantly. This "big data" is constantly analyzed, processed and produces results.

Big data usually involves fuzzy logic and other human related informatics that standard mathematics cannot handle. In other words, a company's list of products and sales accounts is not big data per se, but the NYC transport systems processing of passenger commuting times and locations, destinations, age and sex is.

The way data is processed and how it is accumulated defines how the system will be efficient. Tesla and Waymo both manage large data sets from their AV pilots. Both gather enormous amounts of information from their various input sources and have to analyze the data in various algorithms that will evolve over time. This perfection of processing is how both companies manage their data sets. However, both companies approach this differently.

You might think that how can there be different ways to manipulate data, well in simple computations there isn't much difference, the speed of processing simple mathematics is what defines a simple process. However, with big data, its how you collect, where you store, how you store and what algorithms you use to process the data that defines the nature of the system. Add to this the computers themselves that are used to process big data, and we are not talking about your standard desktop PC, here we are using super-computers that cost millions to design and purchase.

AV is not just about how a car negotiates the environment to navigate from one point to another. It is also about how a vehicle will prevent accidents through a carefully directed complex traffic system. The ultimate target is to reach a human free environment, where all the drivers are computers, however, until we reach that state there are a few decades of intermingled AV and human driving to overcome.

The two recent fatal accidents go to show how AV is far from perfect. In one instance an Uber vehicle hit a pedestrian, and in the other, an AV hit a barrier killing the driver/passenger. The first accident was how a system could not cope with objects appearing in its path, which is something that will happen a lot when a human and AV system integrate. The second is how an AV doesn't manage at all, and in this instance, there was no automatic braking or disengagement system.

A lot of questions arise from these accidents, only three of them are what safety systems can be put in place to help avoid such accidents is one issue. The second is how reliable and how fast is the data being collected during the ride? And the third question is how fast and accurate does the AV system process the incoming data and transfer this data into a decision-making process?

There are five distinct sets of data management involved in AV:

  1. Collection
  2. Processing
  3. Translation-Decision Making
  4. Action
  5. Review

The collection process is based on the various sensors used to collect the data from the environment of the vehicle, one such system is the LiDAR, but it is not the only one. AV's rely on camera's, infra-red, and laser directed radar. Another system will be the integrated communications system between AV's using their GPS to negotiate their position in relation to others in the same location. All this incoming data is streaming onto a single AV processing unit that needs to quickly and efficiently convert this data into a decision-making setting.

The decision-making process has to be a constantly reviewed database of options that the AV can choose from, negotiating the percentages of success to reach a decision. There is no "hunch" driving here; it is all down to pure logic and the calculation of safety, direction, and speed.

Once the system has reached a decision it processes this into an action, which makes the car respond to how the AV pilot wants, and this response is recorded, and the outcome is saved and reviewed as a subset of data for future decision making. All outcomes are used as statistical and focused references for future decisions.

The Tesla and Waymo data gathering difference

The big difference between Tesla and Waymo is in how they access their data. Waymo uses a private fleet of specially designed vehicles being tested around the US on regulated roads. Tesla is gathering data from all of its cars worldwide that engage the autopilot feature during their driving.

The big difference between the two collection samples is evident when just looking at numbers versus quality.

Tesla is looking at large data sets from "low quality" sources. This means that it collects over 3 million miles of data per day from over 300,000 privately owned cars and has surpassed over 5 billion miles of data streaming in from all the times the vehicles were used. This means it not just from autopilot mode. The autopilot also sends back data on how it would handle a situation even when it is not engaged, which gives Tesla a review of the difference between human and AV handling of a situation.

Waymo has reached around 5 million miles in a total of driving on the roads with 500 to 600 vehicles and simulated 5 billion miles. However, simulation is not real-life driving. The big difference between Tesla and Waymo is that Waymo's 5 million rides are concentrated AV driving using a LiDAR and other sensors array while under full autonomous mode. This is concentrated quality AV driving in comparison to Tesla's billions of semi-AV driving under non-LiDAR rides.

Waymo's Big change

Waymo expects that once its fleet grows with the thousands of Chrysler minivans and Jaguars, it intends to place on the streets with full LiDAR arrays, the incoming data will grow exponentially. Basically, Waymo will access billions of miles of focused data.

The different data sources

Waymo relies on LiDAR and constantly upgrades its incoming data source with even more accurate laser pulsed data gathering. Tesla relies on radio-waves, and Elon Musk, Tesla's Founder and CEO claims that LiDAR does not improve safety or AV. Musk is gambling that he can develop an efficient data source without spending so much money on a light focused data processing unit (LiDAR).

However, not everyone agrees with Musk, on such detractor is the Co-director of General Motors-sponsored connected and autonomous driving research lab at Carnegie Mellon University Raj Rajkumar who states that Tesla's reliance on hardware processing of radio-waves is not as efficient as LiDAR; "We don't think the hardware will be sufficient to do that, and I don't think Tesla is particularly anywhere close to getting to [fully] driverless operation".

The bottom line is that there are now two different sources of data collection sets, one is in the source itself, the other is in the way the data is collected during the ride. So we now have two completely different approaches to AV technology.

Processing Data

Musk stated about processing information that comes in during rides "It's actually quite a challenge to process that data, and then train against that data, and have the vehicle learn effectively from the data because it's just a vast quantity."

Waymo performs simulations on all the data that they receive, and these simulations add to the confidence of the data sets. Waymo has 25,000 virtual cars testing the real-life data in thousands of scenarios. This creates a loop of confidence that links between real life processing and virtual processing.

Rajkumar stated that the obvious difference in approach is also based on budget, where Waymo has Alphabet's budget behind it, Musk is not as large, and as such cannot or will not expend a budget as large as Waymo's on virtual testing, which is a heavy hardware expense.

The advantage of simulation

Nvidia is another competitor in the race, and it is not involved in real life data gathering. It relies on simulation and has worked with Uber as well as with its own software products, raking up billions of simulated miles.

According to senior director for automotive at Nvidia, Danny Shapiro, simulation is useful for deciding on the more "interesting" situations, as he stated to the media "There's no way we can possibly drive around and capture all the crazy stuff that happens on the roads. There are trillions of miles that are driven, [but] a lot of those, the majority of those are very boring miles. After a certain point, you've mastered that."

Nvidia has developed "Drive Constellation," a system for helping companies simulate driving and can be adapted for use by companies that are involved in AV development as a simulation package. As Shapiro explained, "There's no way we can possibly drive around and capture all the crazy stuff that happens on the roads."

One of the major issues with any simulation is the processing of "exceptions." Software systems do not process "exceptions" but rely on the standard, common and majority situations. The exceptions come in as part of a risk mitigation addition. With driving, risk mitigation is the main issue. Driving is more than negotiating a route between two points, its about negotiating all the exceptions that will be met along the route. This means that AV is all about exceptions and not the "boring" main stuff, such as driving along a 100 mile stretch of empty desert roadway, but about driving safely, and efficiently in the heart of confusion, such as the streets of Mumbai.

Simulation comes in handy when having to deal with such scenarios since it would be nearly impossible to set a fleet of AV out to roam in intense traffic conditions such as the heart of NYC or downtown LA. One person does have concerns with simulation, and that is Nidhi Kalra, senior information scientist for the RAND Corporation. Kalra claims that "The problem with any simulator is that it's a simplification of the real world. Even if it stimulates the world accurately if all you're simulating is a sunny day in Mountain View with no traffic, then what is the value of doing a billion miles on the same cul-de-sac in Mountain View? I'm not saying that's what anyone's doing, but without that information, we can't know what a a billion miles really means."

Kalra goes on to explain why he is concerned with reliance on simulating miles "If I tell you I've played a billion miles of Grand Theft Auto, it doesn't make me a good driver. When a company says 'we've driven this many miles in simulation,' I think, 'Well, I'm glad you've got a simulator.' Real-world miles still really, really matter. That's where, literally, the rubber meets the road, and there's no substitute for it."

Bottom Line

There is no one best way to evolve; it's a process of trial and error. Simulation alone is not enough and driving alone is also not enough. You need to mix both, and the optimum solution is when you can rack up billions of real life miles together with billions of simulated miles and use them both to extrapolate data for more efficient decision making.

The same goes with data collection, neither a LiDAR alone nor a radio-wave system alone will work perfectly, you need to combine all data sources to provide a comprehensive and efficient system. The use of LiDAR, Radar and infra-red are all important to provide a comprehensive world view.

However, even this is not enough; you need a fast processing unit that can enact an immediate decision that will include emergency braking as one solution.

The only conclusions I can reach is that we are much further away from a real AV society than we are being led to expect. Sure, Waymo is out there with its 500 AV's running around in fully autonomous mode in safe geo-fenced areas. Tesla is out there with its hundreds of thousands of cars on which some are in semi AV mode, but not fully. Added to this are Uber and GM and other companies that are all working on similar projects.

What Uber faced in Tempe is only a minor tip of the iceberg of what cars face daily. Now imagine what an AV would do in Alaska on a winter drive through snow when a Moose decides to wander into the road during minimum visibility. That's one extreme, the other would be driving in a sandstorm in fast traffic conditions in Egypt. Or perhaps trying to mingle with the thousands of cars, bikes, tuk-tuks, and pedestrians in some of Asia's business cities.