Plotly Python Tutorial for Machine Learning Specialists – hkshco.com

Plotly’s is an open-source Python graphing library that is great for building beautiful and interactive visualizations. It is an awesome tool for discovering patterns in a dataset before delving into machine learning modeling. In this article, we will look at how to use it in an example-driven way. 

Some of the visualizations you can expect to see include:

  • line plots, 
  • scatter plots, 
  • bar charts, 
  • error bars, 
  • box plots, 
  • histograms, 
  • heatmaps, 
  • subplots, 
  • and bubble charts. 

CHECK RELATED ARTICLE
📊 The Best Tools for Machine Learning Model Visualization


Why you would choose Plotly

Now, the truth is that you can still get some of these visualizations using Matplotlib, Seaborn, or Bokeh. There are a couple of reasons why you would choose Plotly:

  • the visualizations are interactive unlike Seaborn and Matplotlib
  •  It’s quite straightforward to generate complicated visuals using Plotly’s high-level Express API
  • Plotly also provides a framework known as Plotly Dash that you can use to host your visualizations as well as machine learning projects
  • you can generate HTML code for your visualizations, if you like, you can embed this on your website

That said, generating the visualizations will require that you have your dataset cleaned. That’s a crucial part, otherwise, you will have visuals that deliver the wrong information. In this article, we skip the cleaning and pre-processing part to focus on the visualizations. We’ll provide the entire notebook used at the end of the tutorial. 

It is important that you also keep in mind best practices when creating visualizations, for example:

  • using colors that are eye-friendly
  • ensure that the numbers add up, for example in a pie chart the percentages should total to 100%
  • use the right color scale so that it is automatically clear to the viewer which color represents the higher number and which one represents the lower
  • don’t put too much data in the same visual, for example, you can group and plot the topmost items instead of plotting everything in the dataset
  • ensure that the plot is not too busy
  • always add the source of your data, even when you are the one who has collected it. It builds credibility. 

We can interact with the Plotly API in two ways; 

In this piece, we’ll be using them interchangeably. 

Plotly histogram

A histogram is a representation of the distribution of numerical data with the data being grouped into bins. The count for each bin is then shown. In Plotly, the data can be aggregated using aggregation functions such as sum or average. In Plotly the data to be binned can also be categorical. Here’s an example:

import plotly.express as px
fig = px.histogram(views, x="views")
fig.show()
plotly histogram

Plotly bar chart

A Bar Plot is a great visualization when you want to display a categorical column and a numerical column. It shows the number of a certain numerical column in every category. Plotly Express makes it very easy to plot one. 

fig = px.bar(views_top, x='event', y='views')
fig.show()
plotly bar chart

You are not just limited to vertical bar charts, you can also use a horizontal one. This is done by defining the `orientation`. 

fig = px.bar(views_top, x='views', y='event',orientation='h')
fig.show()
plotly bar chart

Plotly pie chart

A pie chart is another visualization type for showing the number of items in every category. This type enables the user to quickly determine the share of a particular item or value on the whole dataset. Let’s show how one can be plotted using Plotly’s Graph Objects this time. 

import plotly.graph_objects as go

fig = go.Figure(
    data=[
        go.Pie(labels=labels, values=values)
    ])
fig.show()
plotly pie chart

Plotly donut chart

You can change the above visual to a donut chart by specifying the hole parameter. This is the size of the hole you would like to have for the donut chart. 

fig = go.Figure(
    data=[
        go.Pie(labels=labels, values=values, hole=0.2)
    ])
fig.show()
plotly donut chart

Plotly scatter plot

Scatterplots are great for determining whether there is a relationship or correlation between two numerical variables.

fig = px.scatter(df,x='comments',y='views')
fig.show()
plotly scatter plot

Plotly line chart

A line chart is majorly used to show how a certain numerical value changes over time or over a certain interval. 

fig = px.line(talks, x="published_year", y="number_of_events")
fig.show()
plotly line chart

Plotly annotations

Adding text labels and annotations is quite straightforward in Plotly. In a scatter plot this can be done by specifying the text parameter. 

fig = px.scatter(df,x='comments',y='views',color='duration',text="published_day")
fig.show()
plotly annotations

Plotly 3D scatter

In Plotly, a 3D scatterplot can be created by passing the x, y, and z parameters.

fig = px.scatter_3d(df,x='comments',y='views',z='duration',color='views')
fig.show()
plotly 3d scatter

Plotly Write to HTML

Plotly also allows you to save any of your visualizations to an HTML file. This is surprisingly easy to do. 

fig.write_html("3d.html")
plotly html

Plotly 3D surface

Let’s now look at how to plot a 3D surface in Plotly. Similar to the 3D scatter, we have to pass the x,y, and z parameters.

fig = go.Figure(data=[go.Surface(z=df[['duration','views','comments']].values)])

fig.update_layout(title='3D Surface', autosize=False,
                  width=500, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))

fig.show()
plotly 3D Surface

Plotly bubble chart

A Plotly bubble chart is very similar to a scatterplot. In fact, it is built from the scatterplot. The only item we add to it is the size of the bubble. 

fig = px.scatter(df,x='comments',y='views',size='duration',color='num_speaker', log_x=True, size_max=60)
fig.show()
plotly bubble chart

Plotly table

Plotly can also be used to visualize a data frame as a table. We can use Plotly Graph Objects Table to achieve this. We pass the header and the cells to the table. We can also specify the styling as shown below:

fig = go.Figure(data=[go.Table(header=dict(values=views_top.columns,
                                           fill_color='yellow',
),
                 cells=dict(values=[views_top['event'],views_top['views']],
                            fill_color='paleturquoise',
                           ))
                     ])
fig.show()
plotly table

Plotly heatmap

We can use a density heatmap to visualize the 2D distribution of an aggregate function. The aggregate function is applied on the variable in the z axis. The function can be the sum, average or even the count. 

fig = px.density_heatmap(df, x="published_year", y="views",z="comments")
fig.show()
plotly heatmap

Plotly Animations

Plotly Animations can be used to animate the changes in certain values over time. In order to achieve that, one has to define the animation_frame. In this case, it’s the year.

px.scatter(df, x="duration", y="comments",animation_frame="published_year", size="duration", color="published_day")
plotly animations

Plotly box plot

A box plot shows the representation of data through their quartiles. Values falling outside the fourth quartile represent the outliers in your dataset.

fig = px.box(df, x="published_day", y="duration")
fig.show()
plotly box plot

Plotly maps

In order to work with maps in Plotly, you will need to head over to Mapbox and grab your Mapbox API key. With the at hand, you can visualize your data on a map in Plotly. This is done using the scatter_mapbox while passing the latitude and the longitude. 

px.set_mapbox_access_token('YOURTOKEN')
fig = px.scatter_mapbox(df, lat="lat", lon="lon",
                        color="region", 
                        size="views",
                  color_continuous_scale=
                        px.colors.cyclical.IceFire, size_max=15)
fig.show()
plotly maps

Plotly Subplots

With Plotly, we can also visualize multiple plots on the same graph. This is done using Plotly Subplots. The plots are created by defining a facet_col. The graphs will be broken into as many unique values as available from the facet_col column. 

px.scatter(df, x="duration", y="comments",
           animation_frame="published_month", animation_group="event",
           facet_col="published_day",width=1500, height=500,
           size="views", color="published_day",
          )
plotly subplots

Plotly error bars

Error bars are used to show the variability of data in a visualization. Generally, they help in showing the estimated error or the preciseness of a certain measure. The length of the error bar reveals the level of uncertainty. Longer error bars indicate that the data points are more spread out hence more uncertain. They can be applied to graphs such as line charts, bar graphs, and scatterplots.

fig =  go.Figure(
    data=[
        go.Bar(
    x=views_top['event'], y=views_top['views'],
    error_y=dict(type='data', array=views_top['error'].values)
)
    ])
fig.show()
plotly error bars

Final thoughts

Hopefully, this piece has shown you how you can use Plotly in your next machine learning workflow. You can even use it to visualize the performance metrics of your machine learning models. Unlike other tools, its visuals are eye-catching as well as interactive. 

The interactivity enables you to zoom in and out of specific parts in the graph. In this way, you can look a little deeper to analyze your graph in more detail. Specifically, we have seen how you can use popular graphs such as histograms, bar charts, and scatter plots in Plotly. We have also seen that we can build multiple plots on the same graph as well as visualize data on the map. 

The Notebook used can be found here. 

Happy plotting – no pun intended!

Data Scientist


READ NEXT

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Jakub Czakon | Posted November 26, 2020

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

  • use different models and model hyperparameters
  • use different training or evaluation data, 
  • run different code (including this small change that you wanted to test quickly)
  • run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics. 

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.  

This is where ML experiment tracking comes in. 

Continue reading ->


hkshco.com

Leave a Comment