Frequent Itemset Mining Algorithm
- Read Data:
Read the transactional data from the "reviews.txt" file.
- Determine Minimum Support Threshold:
Set a minimum support threshold (minsup), indicating the minimum number of occurrences for an item to be considered frequent.
- Scan Data and Preprocess:
Iterate over each line of the data file. Split each line into individual words and store them as lists of word strings.
- Generate Candidate Prefixes:
Iterate over the lists of word strings to extract unique words as candidate prefixes. Store these prefixes in a set (database_prefixes).
- Count Prefix Occurrences:
Iterate over the dataset prefixes. Count the occurrences of each prefix in the entire dataset. If the count is equal to or greater than the minimum support threshold (minsup), write the prefix and its count to the output file "patterns.txt".