Python-based Jupyter notebooks mostly consist of Python code, obviously, and some Markdown text. But they also offer some very handy functions and shortcuts which are not available in Python itself, and which are really helpful for interactive work. This is my personal best of / reference.
The shortcuts fall into two groups:
- magics: special functions with special syntax whose names start with
%
(in which case they apply to the rest of the line → "line magics") or%%
(in which case they apply to the entire cell → "cell magics") - command line programs: if you know how to use command line programs, you can do so directly from the notebook by prefixing the command line invocation with
!
If you want to follow along, the easiest way is to just click this link and have Binder launch a Jupyter environment with the notebook loaded for you. Or you can download this post in its original notebook format here and load it into your own Jupyter instance yourself.
NB: Most of the following also applies to the IPython REPL.
Magics¶
The syntax of magic functions is modeled after the syntax of command line programs:
- to call them, just write their name and evaluate the cell, without any parentheses (unlike regular Python functions, which are called like this:
function()
) - arguments are separated just by whitespace (in Python, there are commas:
function(arg1, arg)
) - some have optional arguments (options) which tweak their behavior: these are formed by a hyphen and a letter, e.g.
-r
Getting help¶
You can read more about the magic function system by calling the %magic
magic:
%magic
%quickref
brings up a useful cheat sheet of special functionality:
%quickref
If you want more information about an object, %pinfo
and %pinfo2
are your friends:
def foo():
"This foo function returns bar."
return "bar"
# shows the object's docstring
%pinfo foo
# shows the full source code
%pinfo2 foo
These are so handy that they have their own special syntax: ?
and ??
, placed either before or after the object's name:
?foo
foo?
??foo
foo??
Of course, this also works with magic functions:
?%pylab
You can also open a documentation popup by pressing Shift+Tab
with your cursor placed in or after a variable name. Repeating the command cycles through different levels of detail.
Manipulating objects¶
The appeal of an interactive environment like Jupyter is that you can inspect any object you're working with by just evaluating it:
foo = 1
foo
%who
and %whos
will show you all the objects you've defined:
%who
%whos
Sometimes though, these objects are large and you don't want to litter your notebook with tons of output you'll delete right afterwards. (Also, if you forget to delete it, your notebook might get too large to save.) That's when you need to use the Jupyter pager, which lets you inspect an object in a separate window.
foo = "This is a line of text.\n" * 1000
%page foo
By default, the pager pretty-prints objects using the pprint()
function from the pprint
module. This is handy for collections, because it nicely shows the nesting hierarchy, but not so much for strings, because special characters like newlines \n
are shown as escape sequences. If you want the string to look like it would if it were a text file, pass the -r
option ("raw") to page through the result of calling str()
on the object instead:
%page -r foo
If you want to inspect the source code of a module, use %pfile
on the object representing that module, or an object imported from that module:
import os
from random import choice
%pfile os
%pfile choice
Sometimes, you create an object which you know you will want to reuse in a different session or maybe in a completely different notebook. A lightweight way to achieve this is using the %store
magic:
%store foo
You can list the values stored in your database by invoking %store
without arguments:
%store
To restore a variable from the database into your current Python process, use the -r
option:
# restores only `foo`
%store -r foo
# restores all variables in the database
%store -r
And this is how you clear no longer needed variables from storage:
# removes `foo`
%store -d foo
# removes all variables
%store -z
%store
Working with the file system¶
%ls
lists files in the directory where your notebook is stored:
%ls
If you provide a path as argument, it lists that directory instead:
%ls /etc/nginx
If you provide a glob pattern, then only files that match it are listed:
%ls /etc/nginx/*.conf
%ll
("long listing") formats the listing as one entry per line with columns providing additional information:
%ll ~/edu/
One of those columns indicates file size, which is great, but they're in bytes, which is less great (hard to read at a glance). The -h
option makes the file sizes print in human-readable format:
%ll -h ~/edu/python/syn*
%%writefile
writes the contents of a cell to a file:
%%writefile foo.py
def foo():
"This foo function returns bar."
return "bar"
%cat
prints the contents of a file into the notebook:
%cat foo.py
%cat
is called %cat
because it can also concatenate multiple files (or the same file, multiple times):
%cat foo.py foo.py
The output of %cat
can be saved into a file with >
(if the file exists, it's overwritten):
%cat foo.py foo.py >3foos.py
%cat 3foos.py
Hey! Our 3foos.py
is one foo short. Let's add it by appending to the file with >>
:
%cat foo.py >>3foos.py
%cat 3foos.py
There, much better.
%less
opens a file in the pager (with nice syntax highlighting if it's a Python source file):
%less foo.py
%less
is named after the program less
, which is used to page through text files at the command line. Why is the original less
called "less"? Because an earlier pager program was called more
(as in "show me more of this text file"), and as the saying goes, "less is more".
(Programmers are fond of dad jokes. I like how this one works on multiple levels -- the literal meaning that less
-the-program is intended to replace more
-the-program interacts with the figurative meaning that having less is better than having more, and both coalesce into "use less
because it's better than more
".)
%cat
and %ls
are also named after corresponding command line programs.
Finding out more about your code¶
When developing, code often behaves differently from what you intended when you wrote it. The following tools might help you find out why.
Timing the execution of a piece of code will help you determine if it's slowing you down. The %timeit
magic has your back, it runs your code repeatedly and thus provides more reliable estimates. It comes in both line and cell variants.
%timeit sorted(range(1_000_000))
%%timeit
lst = list(range(1_000_000))
sorted(lst)
The cell variant can include initialization code on the first line, which is run only once:
%%timeit lst = list(range(1_000_000))
sorted(lst)
If you have the memory_profiler library installed, you can load its magic extension and use %memit
in the same way as %timeit
to get a notion of how much memory your code is consuming.
%load_ext memory_profiler
%memit list(range(1_000_000))
Peak memory is the highest total amount of memory the Python process used when your code ran. Increment is peak memory minus the amount of memory Python used before your code ran.
%%memit
lst = list(range(1_000_000))
even = [i for i in lst if i % 2 == 0]
%%memit lst = list(range(1_000_000))
even = [i for i in lst if i % 2 == 0]
If you have a more involved piece of code where multiple functions are called, you may need more granular information about running times than that provided by %timeit
. In that case, you can resort to profiling using the %prun
magic. Profiling tells you how fast different parts of your code run relative to each other, in other words, where your bottlenecks are.
import time
def really_slow():
time.sleep(1)
def fast():
pass
def only_slow_because_it_calls_another_slow_function():
fast()
really_slow()
%prun only_slow_because_it_calls_another_slow_function()
The results show up in the pager, here's a copy:
7 function calls in 1.001 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.001 1.001 1.001 1.001 {built-in method time.sleep}
1 0.000 0.000 1.001 1.001 {built-in method builtins.exec}
1 0.000 0.000 1.001 1.001 <ipython-input-81-8d3b1f67a0d9>:3(really_slow)
1 0.000 0.000 1.001 1.001 <ipython-input-81-8d3b1f67a0d9>:9(only_slow_because_it_calls_another_slow_function)
1 0.000 0.000 1.001 1.001 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 <ipython-input-81-8d3b1f67a0d9>:6(fast)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
%prun
also has a cell variant:
%%prun
really_slow()
fast()
6 function calls in 1.001 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.001 1.001 1.001 1.001 {built-in method time.sleep}
1 0.000 0.000 1.001 1.001 {built-in method builtins.exec}
1 0.000 0.000 1.001 1.001 <string>:2(<module>)
1 0.000 0.000 1.001 1.001 <ipython-input-81-8d3b1f67a0d9>:3(really_slow)
1 0.000 0.000 0.000 0.000 <ipython-input-81-8d3b1f67a0d9>:6(fast)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Perhaps the most useful magic for development is %debug
, which allows you to pause the execution of a piece of code, examine the variables which are defined at that moment in time, resume execution fully or step-by-step etc. You can either pass a statement that you want to debug as argument:
def foo():
for i in range(10):
print("printing", i)
%debug foo()
Or you can invoke plain %debug
after an exception has been raised to jump directly to the place where the error occurred, so that you can figure out why things went wrong:
def foo():
dct = dict(foo=1)
return dct["bar"]
foo()
%debug
If you want to pause one of your functions and explore its state at a particular point, set a breakpoint using the set_trace()
function from the IPython.core.debugger
module. The debugger will be automatically invoked when the call to set_trace()
is reached during execution:
from IPython.core.debugger import set_trace
def foo():
for i in range(2):
set_trace()
print("printing", i)
foo()
The Python debugger is called pdb
and it has some special commands of its own which allow you to step through the execution. They can be listed by typing help
at the debugger prompt (see above), or you can have a look at the documentation. The examples above also illustrate what a typical debugging session looks like (stepping through the program, inspecting variables). When you want to stop debugging, don't forget to quit the debugger with quit
(or just q
) at the debugger prompt, or else your Python process will become unresponsive.
Plotting¶
Jupyter is tightly integrated with the matplotlib plotting library. Plotting is enabled by running the %matplotlib
magic with an argument specifying how the notebook should handle graphical output. %matplotlib notebook
will generate an interactive plot which you can resize, pan, zoom and more. A word of caution though: when using this variant, once you're done with the plot, don't forget to "freeze" it using the ⏻ symbol in the upper right corner, or else subsequent plotting commands from different cells will all draw into this same plot.
%matplotlib notebook
import matplotlib.pyplot as plt
plt.plot(range(10))
By contrast, %matplotlib inline
will just show a basic plot with a default size:
%matplotlib inline
plt.plot(range(10))
For more information on plotting with matplotlib, see their usage guide.
Command line programs¶
The operations listed in the section on magics for working with the file system can of course also be achieved using the corresponding command line programs, so if you know those, no need to memorize the magics. In fact, the magics are often just thin wrappers around the command line programs, which is why they are named the same.
!ls --color -hArtl /etc/nginx
The only functionality that I miss among the magics is the ability to take a quick look at part of a possibly very large text file. This can be done with the head
command line program, which prints the beginning of a file:
!head jupyter_magic.ipynb
The -n
option controls how many lines from the beginning of the file should be printed:
!head -n3 jupyter_magic.ipynb regex.ipynb
Similarly, the tail
program prints endings of files:
!tail -n5 jupyter_magic.ipynb
Another useful feature of command line execution is that instead of printing the result, you can have it returned as a list of strings corresponding to lines of output. Either by prepending two exclamation marks instead of one:
!!tail -n5 jupyter_magic.ipynb
Or by assigning the expression to a variable:
out = !tail -n5 jupyter_magic.ipynb
out
In summary¶
These are just my favorite shortcuts, the ones I find most helpful. Obviously, there are many more, see %magic
or %quickref
. If you think I've missed a really neat one, let me know!
Comments
comments powered by Disqus