Stripping Tags from Rich Text Imports Using Python

When importing rich text data into a Data Warehouse (DWH), it’s often necessary to remove unwanted HTML tags while preserving essential formatting. This guide outlines a method using Python’s BeautifulSoup library to clean and structure the data efficiently. Why Strip HTML Tags? Rich text from sources like web applications, CMS platforms, and APIs often includes … Read more

Transforming Your Data with Python: CSV to Parquet Conversion and NaN Handling

Efficient data storage and processing are crucial for businesses and organizations dealing with large datasets. Apache Parquet is a popular columnar storage format offering fast query performance and data compression, while CSV is a row-based format that may not be suitable for large-scale processing. This blog post covers how to convert CSV files to Parquet … Read more

Pandas.merge() function

The merge() function in Pandas is a powerful tool for combining two or more dataframes based on one or more keys. It is analogous to the JOIN operation in SQL databases and offers various options to customize the merge behavior. Here’s the basic syntax of the merge() function: pandas.merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, … Read more

Python MySQL

https://www.w3schools.com/python/python_mysql_getstarted.asp Download and install “MySQL Connector”: C:\Users\Your Name\AppData\Local\Programs\Python\Python36-32\Scripts>python -m pip install mysql-connector-python

Lean Architecture (PHP, Python)

Lean Architecture is the ongoing process of rethinking and improving architectural methodology.  It is the pursuit of better work by applying Lean principles to every aspect of practice. It is about smarter information flow and understanding how we perceive and process information in order to be better communicators amongst ourselves and to the users of … Read more