Your SAP Data Clean-up Superhero: Microsoft Excel
By Clinton Jones on Aug 10, 2016
Re-posted with permission from Winshuttle partner Razorleaf
Everyone knows Microsoft Excel, but very few people realize it is a fully functional data tool with over 340 functions ranging from math to text to counting and searching and on and on. You can think of Excel as a database teenager – it likes to feel free as a bird, it thinks it knows everything, and every now and then it surprises you with its capabilities.
Microsoft Excel = Cheap ETL
Database purists would say that a spreadsheet is a poor substitute for a database (if you’re one of these folks, just think of Microsoft Excel as a cheap ETL -Extract/Transform/Load – tool). But what Excel lacks in structure, discipline, and simultaneous access support, it more than makes up for in speed and ease-of-use. Ease-of-use isn’t necessarily the number one necessity in data manipulation, but in this case, ease-of-use means that a single click of the mouse can fix 150 records of abused and misused data. Excel provides a slew of options excellent for collecting, reformatting, and validating information extracted from, or destined for, your database. Here are just a few functions you should know about:
Filtering in Excel is beyond easy. By enabling the autofilter, each column has a pull-down next to its header that can show a list of all the values in that column (think “SELECT DISTINCT”). You can utilize SELECT ALL, or pick and choose. An option at the bottom will allow you to show only the blanks (very handy when cleaning incomplete information), and you even have the ability to filter with options like “contains” and “begins with.” You also have the option to select ALL…EXCEPT just as easily.
Sorting in Excel is even easier than filtering, and just as helpful. You can sort with or without a header row on any combination of columns. You can use sorting and filtering in combination to quickly and easily present data subsets to complete or update.
Find & Replace
Find, and Find & Replace are also great commands (with infrequently recognized database equivalents). Not only can you blast a change very quickly through a massive amount of data, but when you select a range (even a non-contiguous range) of cells, the Find/Replace will act only on those cells. This (combined with filtering and sorting) allows you to do very carefully targeted replacements on subsets faster than you can say “UPDATE mytable SET field1 = REPLACE(field1, ‘searchstring’, ‘replacestring’) WHERE …”
Data validation allows you to limit what values can be put into a cell. When you select a column, you can tell Excel to let you pick from a list of options (no more typo errors), limit the range of values, force a specific data type, and so on. This is very helpful when you are adding or modifying information in a table.
Formatting in Excel makes things pretty, but Conditional Formatting makes things stand out. With a few quick clicks, you can highlight any cell that meets a criteria, doesn’t meet a criteria, exceeds a value, or even falls inside/outside of a range. In Excel 2007, you can put color scales on values, highlight the top 10%, and even rank values. This formatting is especially handy when you’re “calculating” values using one of Excel’s hundreds of functions.
Autofill is a functionality that allows you to grab the corner of a selection set and have Excel continue the pattern of the set. Excel is very good at determining trends and propagating them across vast amounts of data very quickly (something that typically requires programming or complex logic with databases).
Are you thinking this sounds great, but curious how to get your data into Excel? Or wondering how to get freshly laundered data back into your database? With Excel’s ODBC capabilities (check out the options in Data>Get External Data) it’s very easy to pull information from any database that you would like.
Importing a spreadsheet back into a database is just as easy using a neutral file format (csv, txt, XML, etc.) or through a quick tool like Microsoft Access. When it comes time to build a database table, cleanse the information in a database table, or validate the information in a database, consider using a tool you already have – Microsoft Excel. Don’t worry, I won’t tell anyone you used it.
*The original version of this post was written by Paul Gimbel a member of the Razorleaf senior management team and “feet-on-the-streets” consultant.
Winshuttle products, and Studio in particular work with Excel to complete rounding out your medium volume and medium complexity ETL scenarios for working with SAP.
About the author
The Winshuttle blog is written by professional thought leaders who are dedicated to providing content on a variety of topics, including industry news, best practices, software updates, continued education, tips and techniques, and much more.
Questions or comments about this article?
Tweet @Winshuttle to continue the conversation!