30 days of Data Engineering: Day 3

Sarang Surve
11 min readMar 31, 2023

--

Welcome back, peeps!

If you want to start from the beginning, please visit my first blog.

If you are looking for Day 2, Please visit my previous blog.

This is Day 3 of the Data Engineering series where we will be covering some topics based on Advanced Python for Data Engineering.

Pic Credit: alphalogic

3. Advanced Python for Data Engineering

We will be covering below Python topics in detail with hands-on coding exercises

  1. Magic Methods
  2. Inheritance and Polymorphism
  3. Errors and Exception Handling
  4. Try and Except
  5. User-defined Exceptions
  6. Garbage Collection
  7. Python Debugger
  8. Python Decorators
  9. Memoization using Decorators
  10. Defaultdict
  11. OrderedDict
  12. Generators in Python
  13. Coroutine in Python

Open up colab/jupyter notebook and start coding.
Let’s dive in!

Magic Methods in Python

In Python, Magic methods in Python are the special methods that start and end with the double underscores

  • Magic methods are not meant to be invoked directly by you, but the invocation happens internally from the class once certain action is performed
  • Examples for magic methods are: __new__, __repr__, __init__, __add__, __len__, __del__, etc. The __init__ method used for initialization is invoked without any call
  • Use the dir() function to see the number of magic methods inherited by a class
  • The advantage of using Python’s magic methods is that they provide a simple way to make objects behave like built-in types
  • Magic methods can be used to emulate the behavior of built-in types of user-defined objects. Therefore, whenever you find yourself trying to manipulate a user-defined object’s output in a Python class, then use magic methods.

Example:
v = 4
v.__add__(2)

Implementation —

# __Del__ method
from os.path import join
class FileObject:
def __init__(self, file_path='~', file_name='test.txt'):
self.file = open(join(file_path, file_name), 'rt')

def __del__(self):
self.file.close()
del self.file

Implementation —

# __repr__ method
class String:
def __init__(self, string):
self.string = string

def __repr__(self):
return 'Object: {}'.format(self.string)

Inheritance and Polymorphism in Python

  • In Python, Inheritance, and Polymorphism are very powerful and important concepts
  • Using inheritance you can use or inherit all the data fields and methods available in the parent class
  • On top of it, you can add your own methods and data fields
  • Python allows multiple inheritance i.e you can inherit from multiple classes
  • Inheritance provides a way to write better-organized code and re-use the code

One of the best articles I read on class inheritance by Erdem Isbilen

Syntax:

class ParentClass:
Body of parent class

class DerivedClass(ParentClass):
Body of derived class

  • In Python, Polymorphism allows us to define methods in the child class with the same name as defined in their parent class

Example:
class X:
def sample(self):
print(“sample() method from class X”)

class Y(X):
def sample(self):
print(“sample() method from class Y”)

Implementation —

# Inheritance
class Vehicle:
def __init__(self, name, color):
self.__name = name
self.__color = color

def getColor(self):
return self.__color

def setColor(self, color):
self.__color = color

def get_Name(self):
return self.__name

class Bike(Vehicle):
def __init__(self, name, color, model):
super().__init__(name, color) # call parent class
self.__model = model

def get_details(self):
return self.get_Name() + self.__model + " in " + self.getColor() + " color"

b_obj = Bike("Ninja", "green", "ZX-10R")
print(b_obj.get_details())
print(b_obj.get_Name())

Output —

Ninja ZX-10R in green color
Ninja

Implementation —

# Polymorphism
from math import pi

class Shape:
def __init__(self, name):
self.name = name
def area(self):
pass

class Square(Shape):
def __init__(self, length):
super().__init__("Square")
self.length = length
def area(self):
return self.length**2

class Circle(Shape):
def __init__(self, radius):
super().__init__("Circle")
self.radius = radius
def area(self):
return pi*self.radius**2

a = Square(6)
b = Circle(10)
print(a.area())
print(b.area())

Output —

36
314.1592653589793

Errors and Exception Handling in Python

In Python, an error can be a syntax error or an exception.

When the parser detects an incorrect statement, Syntax errors occur.

  • Exceptions errors are raised when an external event occurs that in some way changes the normal flow of the program
  • Exception error occurs whenever syntactically correct python code results in an error
  • Python comes with various built-in exceptions as well as the user can create user-defined exceptions
  • Garbage collection is the memory management feature i.e a process of cleaning shared computer memory

Some of the python’s built-in exceptions are—
1.) IndexError: When the wrong index of a list is retrieved
2.) ImportError: When an imported module is not found
3.) KeyError: When the key of the dictionary is not found
4.) NameError: When the variable is not defined
5.) MemoryError: When a program run out of memory
6.) TypeError: When a function and operation is applied in an incorrect type
7.) AssertionError: When the assert statement fails
8.) AttributeError: When an attribute assignment is failed

Try and Except in Python

In Python, exceptions can be handled using a try statement

  • The block of code which can raise an exception is placed inside the try clause. The code that handles the exceptions is written in the except clause
  • In case no exception has occurred, the except block is skipped and the program's normal flow continues
  • A try clause can have any number of except clauses to handle different exceptions but only one will be executed in case the exception occurs
  • We can also raise exceptions using the raise keyword
  • The try statement in Python can have an optional finally clause that executes regardless of the result of the try- and except blocks

Example :
try:
print(a)
except:
print(“Something went wrong”)
finally:
print(“Exit”)

Implementation —

# try, except, finally

try:
print(1 / 0)
except:
print("Error occurred")
finally:
print("Exit")

Output —

Error occurred
Exit

User-defined Exceptions

In Python, a user can create his own error by creating a new exception class

  • Exceptions need to be derived from the Exception class, either directly or indirectly
  • Exceptions errors are raised when an external event occurs which in some way changes the normal flow of the program
  • User-defined exceptions can be implemented by raising an exception explicitly, by using an assert statement, or by defining custom classes for user-defined exceptions
  • Use assert statements to implement constraints on the program. When the condition given in the assert statement is not met, the program gives AssertionError in the output
  • You can raise an existing exception by using the raise keyword and the name of the exception
  • To create a custom exception class and define an error message, you need to derive the errors from the Exception class directly
  • When creating a module that can raise several distinct errors, a common practice is to create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions, this is called Hierarchical custom exceptions

Example :
class class_name(Exception)

Implementation —

class Error(Exception):
pass

class TooSmallValueError(Error):
passnumber = 100

while True:
try:
num = int(input("Enter a number: "))
if num < number:
raise TooSmallValueError
break
except TooSmallValueError:
print("Value too small")

Output —

Enter a number: 40
Value too small

Garbage Collection in Python

In Python, Garbage collection is the memory management feature i.e a process of cleaning shared computer memory which is currently being put to use by a running program when that program no longer needs that memory and can be used by other programs

  • In python, Garbage collection works automatically. Hence, python provides good memory management and prevents the wastage of memory
  • In python, forcible garbage collection can be done by calling the collect() function of the gc module
  • In python, when there is no reference left to the object in that case it is automatically destroyed by the Garbage collector of python and __del__() method is executed

Example:
import gc
gc.collect()

Implementation —

#manual garbage collection

import sys, gc

def test():
list = [18, 19, 20,34,78]
list.append(list)

def main():
print("Garbage Creation")
for i in range(5):
test()
print("Collecting..")
n = gc.collect()
print("Unreachable objects collected by GC:", n)
print("Uncollectable garbage list:", gc.garbage)

if __name__ == "__main__":
main()
sys.exit()

Output —

Garbage Creation
Collecting..
Unreachable objects collected by GC: 33

Python Debugger

Debugging is the process of locating and solving the errors in the program. In python, pdb which is a part of Python’s standard library is used to debug the code

  • pdb module internally makes use of bdb and cmd modules
  • It supports setting breakpoints and single stepping at the source line level, an inspection of stack frames, source code listing, etc

Syntax:
import pdb
pdb.set_trace()

  • To set the breakpoints, there is a built-in function called breakpoint()

Implementation —

import pdb

def multiply(a, b):
answer = a * b
return answer

pdb.set_trace()
a = int(input("Enter first number: "))
b = int(input("Enter second number: "))
sum = multiply(a, b)

Decorators in Python

In Python, a decorator is any callable Python object that is used to modify a function or a class. It takes a function, adds some functionality, and returns it.

  • Decorators are a very powerful and useful tool in Python since it allows programmers to modify/control the behavior of a function or class.
  • In Decorators, functions are passed as an argument into another function and then called inside the wrapper function.
  • Decorators are usually called before the definition of a function you want to decorate.

There are two different kinds of decorators in Python:
1. Function decorators
2. Class decorators

  • When using Multiple Decorators for a single function, the decorators will be applied in the order they’ve been called
  • By recalling that decorator function, we can re-use the decorator

Implementation —

#Decorators
def test_decorator(func):
def function_wrapper(x):
print("Before calling " + func.__name__)
res = func(x)
print(res)
print("After calling " + func.__name__)
return function_wrapper

@test_decorator
def sqr(n):
return n**2
sqr(20)

Output —

Before calling sqr
400
After calling sqr

Implementation —

# Multiple Decorators
def lowercase_decorator(function):
def wrapper():
func = function()
make_lowercase = func.lower()
return make_lowercase
return wrapper

def split_string(function):
def wrapper():
func= function()
split_string =func.split()
return split_string
return wrapper

@split_string
@lowercase_decorator
def test_func():
return 'MOTHER OF DRAGONS'
test_func()

Output —

['mother', 'of', 'dragons']

Memoization using Decorators

In Python, memoization is a technique that allows you to optimize a Python function by caching its output based on the parameters you supply to it.

  • Once you memoize a function, it will only compute its output once for each set of parameters you call it with. Every call after the first will be quickly retrieved from a cache.
  • If you want to speed up the parts in your program that are expensive, memoization can be a great technique to use.

One of the best articles I read about Decorators by Hensle Joseph

There are three approaches to Memoization —
1. Using global
2. Using objects
3. Using default parameter
4. Using a Callable Class

Implementation —

#fibonacci series using Memoization using decorators
def memoization_func(t):
dict_one = {}
def h(z):
if z not in dict_one:
dict_one[z] = t(z)
return dict_one[z]
return h

@memoization_func
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)

print(fib(20))

Output —

6765

Defaultdict

In python, a dictionary is a container that holds key-value pairs. Keys must be unique, immutable objects

  • If you try to access or modify keys that don’t exist in the dictionary, it raises a KeyError and breaks up your code execution. To tackle this issue, Python defaultdict type, a dictionary-like class is used
  • If you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it
  • A defaultdict will never raise a KeyError
  • Any key that does not exist gets the value returned by the default factory
  • Hence, whenever you need a dictionary, and each element’s value should start with a default value, use a defaultdict

Syntax:
from collections import defaultdict
demo = defaultdict(int)

Implementation —

from collections import defaultdict 

default_dict_var = defaultdict(list)

for i in range(10):
default_dict_var[i].append(i)

print(default_dict_var)

Output —

defaultdict(<class 'list'>, {0: [0], 1: [1], 2: [2], 3: [3], 4: [4], 5: [5], 6: [6], 7: [7], 8: [8], 9: [9]})

OrderedDict

In python, OrderedDict is one of the high-performance container datatypes and a subclass of dict object. It maintains the order in which the keys are inserted. In case of deletion or re-insertion of the key, the order is maintained and used when creating an iterator

  • It’s a dictionary subclass that remembers the order in which its contents are added
  • When the value of a specified key is changed, the ordering of keys will not change for the OrderedDict
  • If an item is overwritten in the OrderedDict, its position is maintained
  • OrderedDict popitem removes the items in the FIFO order
  • The reversed() function can be used with OrderedDict to iterate elements in the reverse order
  • OrderedDict has a move_to_end() method to efficiently reposition an element to an endpoint

Example:
from collections import OrderedDict
my_dict = {‘Sunday’: 0, ‘Monday’: 1, ‘tuesday’: 2}
# creating ordered dict
ordered_dict = OrderedDict(my_dict)

Generators in Python

In Python, Generator functions act just like regular functions with just one difference they use the Python yield keyword instead of return. A generator function is a function that returns an iterator A generator expression is an expression that also returns an iterator

  • Generator objects are used either by calling the next method on the generator object or using the generator object in a “for in” loop.
  • A return statement terminates a function entirely but a yield statement pauses the function saving all its states and later continues from there on successive calls.
  • Generator expressions can be used as function arguments. Just like list comprehensions, generator expressions allow you to quickly create a generator object within minutes with just a few lines of code.
  • The major difference between a list comprehension and a generator expression is that a list comprehension produces the entire list while the generator expression produces one item at a time as lazy evaluation. For this reason, compared to list comprehension, a generator expression is much more memory efficient.

Example:
def generator():
yield “x”
yield “y”
for i in generator():
print(i)

Implementation —

def test_sequence():
num = 0
while num<10:
yield num
num += 1
for i in test_sequence():
print(i, end=",")

Output —

0,1,2,3,4,5,6,7,8,9,

Implementation —

# Python generator with Loop
#Reverse a string
def reverse_str(test_str):
length = len(test_str)
for i in range(length - 1, -1, -1):
yield test_str[i]
for char in reverse_str("Trojan"):
print(char,end =" ")

Output —

n a j o r T

Implementation —

# Generator Expression
# Initialize the list
test_list = [1, 3, 6, 10]
# list comprehension
list_comprehension = [x**3 for x in test_list]
# generator expression
test_generator = (x**3 for x in test_list)
print(list_comprehension)
print(type(test_generator))
print(tuple(test_generator))

Output —

[1, 27, 216, 1000]
<class 'generator'>
(1, 27, 216, 1000)

Coroutine in Python

  • Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed
  • Because coroutines can pause and resume execution context, they’re well suited to concurrent processing
  • Coroutines are a special type of function that yield control over to the caller but does not end its context in the process, instead maintaining it in an idle state
  • Using coroutines the yield directive can also be used on the right-hand side of an = operator to signify it will accept a value at that point in time.

Implementation —

def func():
print("My first Coroutine")
while True:
var = (yield)
print(var)

coroutine = func()
next(coroutine)

Output —

My first Coroutine

That’s it for now!

Please visit my next blog for Day 4 and more.

Follow for more updates & Stay tuned!
Keep learning and coding

--

--

No responses yet