Anonymization of personal data (experience report)

If you have instructions and/or interesting information that you would like to share with others, you can write your first post about it.
Post Reply
User avatar
hballin
Posts: 2
Joined: Sun Dec 21, 2025 9:00 am

Anonymization of personal data (experience report)

Post by hballin »

At the beginning of my project, I tested various tools for anonymizing log files and tested and evaluated their usage characteristics. In addition, tools for anonymization for database and image information are also offered. The topic of log files was chosen because this is a common situation with enormous implications in the IT sector.
Let me describe the process described earlier as follows:
  • An IT employee passes on log files to third parties without knowing what information is contained in these files. Due to the amount of data generated, he also does not have the opportunity to analyze and evaluate every data line and take appropriate measures.
  • The person responsible is the person/company who would never have disclosed his/her data to third parties in this form and form.
  • Not to mention the legal steps that the whole thing may entail.
During the tool analysis, I noticed again and again that the majority of these tools were usually created due to a specific necessity: "To ensure that collected PII data (Personally Identifiable Information) does not fall into unauthorized hands before it leaves the author's environment."
With the large number of tools offered, there are two differences in the approach to anonymizing PII data that stand out. Some anonymize the PII data on the local system (usually laptop/desktop), and others first send (upload) the PII data outside the author's sphere of activity before anonymizing this PII data.
With the latter approach, the thought arises as to whether it makes sense at all to subject the PII data to an anonymization process if this sensitive PII data has already left the safe space before anonymization.
I think not, and for this reason I don't want to discuss these on-line anonymization tools further in this article!
The secure variant of data anonymization is that which takes place on a local system (can also be a server system within a company).
The experience report looks at the following types of tools for data anonymization:
  • Command-line-based tools
  • UI-based tools
Command-line-based tools
The command-line-based tools carry out their work, as the name suggests, via command line statements. These command lines can also be combined in other batch programs.
The advantage is the use in existing, proprietary software applications, which make data anonymization necessary. The disadvantage lies in the user-friendliness of the tool, which is usually cryptic.

Loganon - https://github.com/sys4/loganon
The Loganon is a tool for generic log file anonymization. The product description implicitly states that it was developed for use on Linux. Since the core program now uses the runtime environment of Python, it can also be used on other OS platforms if necessary. The tool is designed to be generic. In order to achieve a certain user comfort by adding command line tools to them.

Anonymize data - Splunk Documentation - https://docs.splunk.com/Documentation/S ... nymizedata
For the Splunk application, Splunk has developed an anonymization tool that can be used to anonymize PII data before transferring data to Splunk. To use the anonymization tool, a Splunk Enterprise instance must be available.

Microsoft Presidio - https://microsoft.github.io/presidio/
The Presidio tool has proven to be very versatile. Like Loganon, it is based on Python. It can be used for common log files as well as for anonymizing image files. The user-friendliness could not convince me here either. I don't think this was the original intention of Presidio either. And that's why I would classify it as a framework for developers, with which they can define and implement their requirements in great detail.

UI-based tools
The UI-based tools perform their work in a graphical user interface. In some cases, the anonymization tools also allow the use of command-line statements.
The advantage is that it can be used in existing software applications that require data anonymization. As well as the user-friendliness of the tool. The disadvantage is its use in so-called headless systems.

Log File Anonymizer (b-technology.net) - https://www.b-technology.net/btlogfileanonymizer.html
It is not difficult to see that none of the tools described above could convince me in terms of user-friendliness. For this reason, I have developed a tool with which you can a) anonymize your log files and b) find a good user experience.
In my opinion, the tool has a well-done UI that supports digital accessibility. It can be executed both on-demand and in a batch process (also known as batch processing). In addition to many additional functions, it offers data encryption, automatic uploads and/or data splitting. The additional functions can be used optionally.

The Anonymization Methods
In the following chapter, the two anonymization methods (data anonymization and data pseudonymization) will be vividly described and the problem that arises will be clarified.

Data Anonymization
Valid, classic data anonymization can distort the result enormously if used too conservatively. The result is still apparently a house, but important attributes (window, door, etc.) have been lost. The interlocutor does not have an objective picture of what topic is being discussed at the house.
Image

Data Pseudonymization
Data pseudonymization counteracts this problem. The characteristics of the two houses have remained the same (roof, windows with window cross, entrance, etc.), although both differ in many attributes from the outside. The interlocutor has an objective picture in mind of what topic is being discussed at the house.
Image

Result of the Analysis
As a result of the analysis of data anonymization tools and discussions with IT security experts, two methods for data anonymization were ultimately discussed:
  1. Classic data anonymization
  • Data pseudonymization
When data that has been subjected to classic data anonymization is transferred to professional software support, experience has shown that a satisfactory solution has become a distant prospect. In contrast to classic data anonymization, software support would come to the expected, practicable solution much faster with data pseudonymization, and the problem would be off the table in a timely manner.
Whether data anonymization is carried out via a command-line or UI-based tool must be decided by the IT manager for himself. My favourite is clearly the UI-based variant of data anonymization, because it makes working with the tool easier, especially if the tool is used sporadically.


#DataAnonymization
#DataObfuscation
#Anonymization
#Obfuscation
#PIIData
#PII
#GDPR
Post Reply