3 Altair Transformations to Save You Time

Carlos D Serrano
Stackademic
Published in
3 min readFeb 6, 2024

--

We often want to visualize data in our Dashboards or Streamlit apps but don't want to transform the same DataFrame into multiple shapes to visualize them. Altair supports many transformations, and in this article, I will walk you through three that I find can save me time and effort and make my code smaller and more readable. I will use Streamlit in this article to run all my Altair visualizations.

Folding Columns

Dealing with multiple series is common, but what happens when series are in different columns? The most common scenario is using pandas melt or unstack to create a new DataFrame that contains the data from these columns in a key-pair value mode. However, dealing with several dataframes can consume memory and slow your apps or dashboards. The transform_fold function from Altair allows columns to be folded into a key-pair value set to create charts through multiple columns, avoiding having to reshape your DataFrame. Although this can be accomplished by creating various charts, one per column, and concatenating them in Altair, the codebase would be at least the number of columns larger than using transform_fold

Sample Data from Vega-Datasets (seattle_weather)
folded_chart = (
alt.Chart(df)
.mark_line()
.transform_fold(
fold=["precipitation", "temp_min", "temp_max", "wind"],
as_=["measurement", "value"],
)
.encode(
x=alt.X("date", type="temporal", timeUnit="yearmonth"),
y=alt.Y("value", type="quantitative", aggregate="mean"),
color=alt.Color("measurement", type="nominal"),
)
)

st.altair_chart(folded_chart, use_container_width=True)

Smoothing using LOESS

When working with timeline data, forecasts, and plot charts, visualizations occasionally need a smooth curve between points, usually to ignore outliers. LOESS is a commonly used method for this, and instead of creating a new data set with calculations, Altair has the transfrom_loess method to help with this. Apply this transformation to your visualization and use it in a chart concatenation to show both the smoothened value and the original chart, as shown below.

smoothen = (
alt.Chart(df)
.mark_bar(color="lightgray")
.encode(
x=alt.X("date", type="temporal", timeUnit="yearmonthdate"),
y=alt.Y("temp_max", type="quantitative", aggregate="mean"),
)
)
st.altair_chart(
smoothen + smoothen.transform_loess("date", "temp_max", bandwidth=0.1).mark_line(),
use_container_width=True,
)

JoinAggregate

It is very common to aggregate values in datasets to display statistical information. In Altair, one of the methods to display various aggregates on the same chart is to create a chart per aggregate and then concatenate these into a single chart.

The transform_joinaggregate method allows the definition of multiple aggregates, which get added as new columns to the source DataFrame for the chart. This is a great advantage as these can later be used for other transformations in the chart. In this example, min, max, and mean aggregates for the miles per gallon column were defined grouping by the origin column. Once these three columns were created, I combined the joinaggregate columns with a transform_fold to bring this as key-pair values and create a multi-series chart. This allows me to prep and display three different aggregates in a single chart without the need for concatenations and with very few lines of code.

Note: The order of the columns in your transform fold affect the “z-index” of your marks in your chart.

cars = data.cars()
ranked_chart_mpg = (
alt.Chart(cars)
.mark_rect()
.transform_joinaggregate(
mean_mpg="mean(Miles_per_Gallon)",
max_mpg="max(Miles_per_Gallon)",
min_mpg="min(Miles_per_Gallon)",
groupby=["Origin"],
)
.transform_fold(
fold=[
"max_mpg",
"mean_mpg",
"min_mpg",
],
as_=["measurement", "value"],
)
.encode(
x=alt.X(
"Origin",
type="nominal",
),
y=alt.Y("value", type="quantitative"),
color=alt.Color("measurement", type="nominal"),
# )
)

Conclusion

Altair transformations are a handy way of transforming your data for visualization purposes without altering your source data or duplicating your datasets. There are many other transformation types that can suit your data visualization needs. Thanks for reading!

Stackademic

Thank you for reading until the end. Before you go:

--

--

Sr. Solution Innovation Architect @ Snowflake • Streamlit • DataOps • Hispanic Data Community Leader • 🇵🇷 ▶️ 🇺🇸