Why even bother?

When working with DataFrames in pandas, I have always missed the ease with which I could select rows and columns in Excel. Simple things like deleting a particular cell or arranging columns were soo incredibly easy and intuitive. Of course Google sheets take this one step further with its fancy editor. I am yet to find such ease of use within pandas.

In today’s post I would like to share a couple of handy techniques I use every soo often to deal with the seemingly simple task of selecting and moving around rows and columns in Pandas.

The Basics

Setting up our base data

Here’s the base code on top of which I will show all examples:

import pandas as pd
import numpy as np
import re
df = pd.read_csv('https://people.sc.fsu.edu/~jburkardt/data/csv/zillow.csv')

Vanilla ways of selecting columns

Using angle brackets

Doing so will create a new df with the columns Index, Beds and Baths


df = df[['Index', 'Beds', 'Baths']]

Using .get

If any column mentioned in the list does not exist, it will return a NONE. This technique is useful for two reasons:

  1. When you need to explicitly and in a human readable way mention which columns you need
  2. When the each of the columns mentioned are absolutely necessary.

If any one column could not be found in the source df, you get no df in the output. Instead you get a nice little ‘None’

df.get(['Index', 'Beds'])

df.get(['Index', 'Beds', 'Swimming Pools'])
#$ None

Filter from existing columns

While ‘.get’ returns a ‘None’ on a nonexistent column, we can still return whatever other columns were found in the source df, sans the one that wasn’t

df[df.columns[df.columns.isin(['Index', 'Beds', 'Swimming Pools'])]]

Filter out unwanted columns

Select everything EXCEPT a particular column or a subset of columns. The following code gives us all columns except “Beds” and “Sqft”

Using Difference

df[df.columns.difference(['Beds', 'Sqft'])]

Using .isin()

df[df.columns[~df.columns.isin(['Beds','Sqft'])]]

Pro coder ways of selecting columns

Selecting based on data type

I recently had the good fortune of working with a huge dataset with a decade worth of transactions. One of the things I had to do was select all columns with some pricing information. This technique is a good candidate to show how we could use some better mindful ways to select data en mass. Before you use this technique, I would also suggest getting to know what the data types of each column with:

df.dtypes

then using another command to automatically converting types or doing it manually

df = df.convert_dtypes()

Select all columns of type number (64 bit integers to be specific)

df.loc[:,(df.dtypes=='int64').values]

Note: the required data was a 3 digit integer hence I choose ‘int'64, else we would use more expansive types like float.

Select a column based on column names

If column name contains a character or word

Here the code says: Give me all rows of df for which each column has the word 'price' in it.


df.loc[:,['price' in i.lower() for i in df.columns]]

If column name starts with a particular character or word (prefix)


df.loc[:,[df.columns.str.startswith('S')]]

If column name ends with a particular character or word (suffix)


df.loc[:,[df.columns.str.endswith('s')]]

Exotic ways of selecting columns

Now this may not be your everyday use case but I still mention it here for reference. I use it every once in a while and always forget how it works. When it works, it works and you wonder why you didn’t know this existed.

Select the columns if the rows meet a certain condition

df.loc[:,[(df[col].mean() > 7) for col in df.columns]]

In this we use this thing called list comprehension where we generate a list of columns that meet a certain condition.

Wrapping up

I hope these techniques were insightful and more than that, you use them in some project soon. I am always on the look out to improve my game, if you think you have something to add– either a whole new technique or changes to an existing bit of code, please hit me up on Github or Twitter at @thisAllwyn . You can also join my telegram Channel to interact with other Pythonistas and extend the conversation there 😄