Analysis of Electric Car Data#

Author: Yongjun Zhu

Course Project, UC Irvine, Math 10, F23

Introduction#

Introduce your project here. Maybe 3 sentences.

In the era of sustainable transportation, electric vehicles (EVs) have emerged as a pivotal force reshaping the automotive landscape. As we witness a surge in the adoption of electric cars, understanding the nuances of their performance and efficiency becomes crucial for both consumers and industry stakeholders. This project delves into an analysis of electric car data, focusing on whether a diference in power train affect the range, top speed, efficiency.

Importing and Cleaning the data#

You can either have all one section or divide into multiple sections. To make new sections, use ## in a markdown cell. Double-click this cell for an example of using ##

import pandas as pd
import altair as alt
import seaborn as sns
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
df = pd.read_csv("ElectricCarData_Clean.csv")
df
Brand Model AccelSec TopSpeed_KmH Range_Km Efficiency_WhKm FastCharge_KmH RapidCharge PowerTrain PlugType BodyStyle Segment Seats PriceEuro
0 Tesla Model 3 Long Range Dual Motor 4.6 233 450 161 940 Yes AWD Type 2 CCS Sedan D 5 55480
1 Volkswagen ID.3 Pure 10.0 160 270 167 250 Yes RWD Type 2 CCS Hatchback C 5 30000
2 Polestar 2 4.7 210 400 181 620 Yes AWD Type 2 CCS Liftback D 5 56440
3 BMW iX3 6.8 180 360 206 560 Yes RWD Type 2 CCS SUV D 5 68040
4 Honda e 9.5 145 170 168 190 Yes RWD Type 2 CCS Hatchback B 4 32997
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
98 Nissan Ariya 63kWh 7.5 160 330 191 440 Yes FWD Type 2 CCS Hatchback C 5 45000
99 Audi e-tron S Sportback 55 quattro 4.5 210 335 258 540 Yes AWD Type 2 CCS SUV E 5 96050
100 Nissan Ariya e-4ORCE 63kWh 5.9 200 325 194 440 Yes AWD Type 2 CCS Hatchback C 5 50000
101 Nissan Ariya e-4ORCE 87kWh Performance 5.1 200 375 232 450 Yes AWD Type 2 CCS Hatchback C 5 65000
102 Byton M-Byte 95 kWh 2WD 7.5 190 400 238 480 Yes AWD Type 2 CCS SUV E 5 62000

103 rows × 14 columns

df.isnull().values.any()
False

Since the output is false, this Dataset is clean. After make sure that the data is clean, we now need to basicly describe the relevant variables I would use.

# Calculate mean, median, standard deviation
mean_Efficiency_WhKm = df['Efficiency_WhKm'].mean()
median_Efficiency_WhKm = df['Efficiency_WhKm'].median()
std_Efficiency_WhKm = df['Efficiency_WhKm'].std()
print (mean_Efficiency_WhKm)
print (median_Efficiency_WhKm)
print (std_Efficiency_WhKm)
189.16504854368932
180.0
29.566839230892835
mean_TopSpeed_KmH = df['TopSpeed_KmH'].mean()
median_TopSpeed_KmH = df['TopSpeed_KmH'].median()
std_TopSpeed_KmH = df['TopSpeed_KmH'].std()
print (mean_TopSpeed_KmH)
print (median_TopSpeed_KmH)
print (std_TopSpeed_KmH)
179.19417475728156
160.0
43.573030481499785
mean_Range_Km = df['Range_Km'].mean()
median_Range_Km = df['Range_Km'].median()
std_Range_Km = df['Range_Km'].std()
print (mean_Range_Km)
print (median_Efficiency_WhKm)
print (std_Efficiency_WhKm)
338.7864077669903
180.0
29.566839230892835

For efficiency, the mean and median are close, indicating a symmetric distribution. The relatively low standard deviation suggests consistency in acceleration values. For top speed, the mean is higher than the median, suggesting a right-skewed distribution with a few vehicles having very high top speeds. For range, the mean is significantly higher than the median, indicating a right-skewed distribution with a few vehicles having much longer ranges.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from mpl_toolkits.axes_grid1 import make_axes_locatable
sns.set(style="whitegrid")
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
sns.histplot(df['Efficiency_WhKm'], bins=10, kde=True, ax=axes[0, 0])
axes[0, 0].set_title('Efficiency Distribution')
sns.histplot(df['Range_Km'], bins=10, kde=True, ax=axes[0, 1])
axes[0, 1].set_title('Range Distribution')
sns.histplot(df['FastCharge_KmH'], bins=10, kde=True, ax=axes[1, 0])
axes[1, 0].set_title('FastCharge Distribution')
sns.histplot(df['TopSpeed_KmH'], bins=10, kde=True, ax=axes[1, 1])
axes[1, 1].set_title('TopSpeed distribution')
plt.tight_layout()
plt.show()
../../_images/06d72d6869538ab7a151bf51ac371ad33669c48d8c2d0c0694a14fd728a1f128.png

Predicting Factor Whether Diffference in Power Train is Important.#

I want to explore whether a diference in power train affect the range, top speed, or efficiency. For example, does AWD has a higher top speed than RWD or does RWD has a higher efficiency?

Decision Tree Regression#

df
Brand Model AccelSec TopSpeed_KmH Range_Km Efficiency_WhKm FastCharge_KmH RapidCharge PowerTrain PlugType BodyStyle Segment Seats PriceEuro
0 Tesla Model 3 Long Range Dual Motor 4.6 233 450 161 940 Yes AWD Type 2 CCS Sedan D 5 55480
1 Volkswagen ID.3 Pure 10.0 160 270 167 250 Yes RWD Type 2 CCS Hatchback C 5 30000
2 Polestar 2 4.7 210 400 181 620 Yes AWD Type 2 CCS Liftback D 5 56440
3 BMW iX3 6.8 180 360 206 560 Yes RWD Type 2 CCS SUV D 5 68040
4 Honda e 9.5 145 170 168 190 Yes RWD Type 2 CCS Hatchback B 4 32997
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
98 Nissan Ariya 63kWh 7.5 160 330 191 440 Yes FWD Type 2 CCS Hatchback C 5 45000
99 Audi e-tron S Sportback 55 quattro 4.5 210 335 258 540 Yes AWD Type 2 CCS SUV E 5 96050
100 Nissan Ariya e-4ORCE 63kWh 5.9 200 325 194 440 Yes AWD Type 2 CCS Hatchback C 5 50000
101 Nissan Ariya e-4ORCE 87kWh Performance 5.1 200 375 232 450 Yes AWD Type 2 CCS Hatchback C 5 65000
102 Byton M-Byte 95 kWh 2WD 7.5 190 400 238 480 Yes AWD Type 2 CCS SUV E 5 62000

103 rows × 14 columns

cols = ["TopSpeed_KmH", "Range_Km",'Efficiency_WhKm']
clf = DecisionTreeClassifier(max_leaf_nodes=3, random_state=2)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[cols], df["PowerTrain"], test_size=0.1, random_state=1)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_leaf_nodes=3, random_state=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

fig = plt.figure()
_ = plot_tree(clf, 
                   feature_names=clf.feature_names_in_,
                   class_names=clf.classes_,
                   filled=True)
../../_images/33af7e4a9144c437723cc63332e2e61ad3753788dd10a7d0d3f594b6486846ba.png
clf.score(X_test, y_test)
0.9090909090909091
clf.classes_
array(['AWD', 'FWD', 'RWD'], dtype=object)
rng = np.random.default_rng()
arr = rng.random(size = (5000,3))
df2 = pd.DataFrame(arr, columns=cols)
df2.head(100)
df2["TopSpeed_KmH"] *= 533
df2["Range_Km"] *= 2000
df2['Efficiency_WhKm']*= 1500
df2['pred'] = clf.predict(df2[cols])
df2
TopSpeed_KmH Range_Km Efficiency_WhKm pred
0 75.763884 255.966114 900.393787 FWD
1 216.167973 987.532089 507.045214 AWD
2 230.895722 911.609365 734.088010 AWD
3 168.836826 410.808157 122.222156 FWD
4 118.586470 280.029296 1214.505492 FWD
... ... ... ... ...
4995 400.444607 1747.976327 1220.161437 AWD
4996 365.754845 1779.344836 742.714966 AWD
4997 362.125188 593.176167 5.623374 AWD
4998 222.911450 1436.147150 395.546491 AWD
4999 234.150228 875.772255 273.838896 AWD

5000 rows × 4 columns

alt.Chart(df2).mark_circle().encode(
    x = "TopSpeed_KmH",
    y = "Efficiency_WhKm",
    color = 'pred'
)

Thus, from the above graph, we can easily see that the power train is irrelevant to efficiency of those EVs. However, the topspeed does. Even if the top speeds of FWD and RWD are almost the same, both smaller than 173.5km/h, cars with AWD can drive much faster than RWD and FWD.

alt.Chart(df2).mark_circle().encode(
    x = "TopSpeed_KmH",
    y = "Range_Km",
    color = 'pred'
)

From the above graph, we can see that compare to RWD, FWD EVs has more range. While, still, AWD get the highest topspeed. It seems that power train related to the range and the topspeed, instead of the efficiency.

alt.Chart(df2).mark_circle().encode(
    x = "Efficiency_WhKm",
    y = "Range_Km",
    color = 'pred'
)

This graph proves our thoughts above that the data of eficiency are all random, and only RWD and FWD affect the range.

Some Concerns about the Relation#

We do show that the power training does affected EVs’ topspeed and range. However, price is also a primary effect. In our common sense, FWD and RWD vehicles are more affordable, while AWD EVs are more expensive. Let’s us check whether price is also affected.

df
Brand Model AccelSec TopSpeed_KmH Range_Km Efficiency_WhKm FastCharge_KmH RapidCharge PowerTrain PlugType BodyStyle Segment Seats PriceEuro
0 Tesla Model 3 Long Range Dual Motor 4.6 233 450 161 940 Yes AWD Type 2 CCS Sedan D 5 55480
1 Volkswagen ID.3 Pure 10.0 160 270 167 250 Yes RWD Type 2 CCS Hatchback C 5 30000
2 Polestar 2 4.7 210 400 181 620 Yes AWD Type 2 CCS Liftback D 5 56440
3 BMW iX3 6.8 180 360 206 560 Yes RWD Type 2 CCS SUV D 5 68040
4 Honda e 9.5 145 170 168 190 Yes RWD Type 2 CCS Hatchback B 4 32997
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
98 Nissan Ariya 63kWh 7.5 160 330 191 440 Yes FWD Type 2 CCS Hatchback C 5 45000
99 Audi e-tron S Sportback 55 quattro 4.5 210 335 258 540 Yes AWD Type 2 CCS SUV E 5 96050
100 Nissan Ariya e-4ORCE 63kWh 5.9 200 325 194 440 Yes AWD Type 2 CCS Hatchback C 5 50000
101 Nissan Ariya e-4ORCE 87kWh Performance 5.1 200 375 232 450 Yes AWD Type 2 CCS Hatchback C 5 65000
102 Byton M-Byte 95 kWh 2WD 7.5 190 400 238 480 Yes AWD Type 2 CCS SUV E 5 62000

103 rows × 14 columns

cols2 = ["TopSpeed_KmH", "Range_Km",'PriceEuro']
clf = DecisionTreeClassifier(max_leaf_nodes=3, random_state=2)
from sklearn.model_selection import train_test_split
X_train_copy, X_test_copy, y_train_copy, y_test_copy = train_test_split(df[cols2], df["PowerTrain"], test_size=0.1, random_state=1)
clf.fit(X_train_copy, y_train_copy)
DecisionTreeClassifier(max_leaf_nodes=3, random_state=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
clf.score(X_test_copy, y_test_copy)
0.7272727272727273
rng = np.random.default_rng()
arr = rng.random(size = (5000,3))
df3 = pd.DataFrame(arr, columns=cols2)
df3.head(100)
df3["TopSpeed_KmH"] *= 533
df3["Range_Km"] *= 2000
df3['PriceEuro']*= 150000
df3['pred'] = clf.predict(df3[cols2])
df3
TopSpeed_KmH Range_Km PriceEuro pred
0 477.074797 236.322462 105212.397918 AWD
1 264.183237 148.540958 17435.334464 RWD
2 501.443481 1650.618871 120598.841358 AWD
3 85.999767 1937.402358 28153.593600 FWD
4 235.734838 645.247924 120656.262036 AWD
... ... ... ... ...
4995 296.087939 1224.857933 51461.448756 AWD
4996 59.041436 664.649628 51193.516643 AWD
4997 52.304701 567.583976 141330.737613 AWD
4998 70.680920 1163.603349 70373.781378 AWD
4999 123.760752 1991.547730 131496.304429 AWD

5000 rows × 4 columns

alt.Chart(df3).mark_circle().encode(
    x = "TopSpeed_KmH",
    y = "PriceEuro",
    color = 'pred'
)

Thus, we found that AWD EVs have higher price than FWD and RWD. Due to this high price, it is reasonable for AWD to have a higher topspeed but not range.

Summary#

Normally, all else being equal, AWD vehicles tend to have slightly lower ranges compared to their RWD or FWD counterparts. This is because AWD systems generally involve more components and increased weight, which can affect energy efficiency. However, since in reality, AWD EVs always have a hgher price, this “slightly low” might not reflect from the analysis. The choice of powertrain can influence the top speed capabilities of an electric vehicle. AWD vehicles might have higher top speeds, especially in performance-oriented models, due to improved traction and stability. Generally, FWD electric vehicles tend to be more energy-efficient compared to AWD models. This is because AWD systems introduce additional mechanical components and increase the overall weight of the vehicle.

References#

Your code above should include references. Here is some additional space for references.

  • What is the source of your dataset(s)?

  1. Datasource: ev-database.org/

  2. Banner image: freepik - author - ‘macrovector’

  • List any other references that you found helpful.

AWD, FWD, or RWD—Which Wheel Drive Is Best?

Front-Wheel Drive vs. Rear-Wheel Drive | Pros & Cons ()