Throughout my journey in the Flatiron School so far, one of the most useful techniques that I find the most useful is the mastery of manipulating DataFrames. As many of you will see, the bulk of the work that you will be doing will involve DATA! And if you are not familiar on how to manipulate the data in ways that you want it will be very difficult to perform modeling and analysis on your chosen dataset.
Today I will show you Pandas DataFrame methods loc and iloc and how you can use them to select the exact data that you might need on your next task or project.
Before diving into Loc and iLoc, let’s clarify the differences between the two.
- iLoc is an integer position-based meaning that you have to specify the rows and columns by their integer position value. Similar to if you had a list and you wanted to select the first element, the integer position for the first value would be 0.
- loc on the other hand is label based, meaning that you have to specify the column label for row and columns as opposed to integer position
To demonstrate today’s methods, I will be using the iris data set from sklearn.
Let’s take a look at the dataframe :
- Selecting a single value:
For example the 3rd entry:
df.loc[2]: although this is a label based selecting using integer index also works
df.iloc[2]
2. Selecting a range using slicing:
let’s say we want to select the firs 5 rows and return only their sepal length and width
First let’s change the column names so that it is easier to slice using column names:
back to our prompt of selecting the first 5 rows of sepal length and width:
Using LOC:
Using iLOC:
Now let’s get to my favorite part of using loc and iloc. The beauty of these methods is that they can even accept conditions. This makes it very helpful in finding outliers and just selecting the correct data that you need.
For instance, let’s say that we want to return all the flowers with sepal length greater than 4.
Notice here that we needed to convert the iloc conditionals into a list. iLoc cannot accept a boolean series however it can accept a list.
Another great function of these methods is that they can accept multiple conditions:
Now we want sepal length > 4 and sepal width below 3.5:
As you can see there are endless combinations that you can select using the iloc and loc methods. The more you master these methods the easier it will be to clean data and manipulate the dataframes. So far in my short Data Science journey I believe that these pandas methods are some of the most important skills to master early on.