大量データを文字列比較してグループ化 【Excel VBA】
↓
Public find_word As String Public find_word_len As Integer Public MaxRow As Integer Public MaxRow2 As Integer Public NewWorkSheet As Worksheet Public BaseWorkSheet As Worksheet Public StartTime As Single '処理の開始時刻を格納する変数領域 Public EndTime As Single '処理の終了時刻を格納する変数領域 Dim atime, ztime As Variant Dim Start As Single Dim Finish As Single '大量データを文字列比較してグループ化 Public Sub message_grp() Start = Timer Application.Calculation = xlManual Application.ScreenUpdating = False Set BaseWorkSheet = ActiveSheet '出力用シート作成 Set NewWorkSheet = Worksheets.Add() NewWorkSheet.Name = "類似文字列集計" NewWorkSheet.Move After:=BaseWorkSheet NewWorkSheet.Cells(1, 1) = "No" NewWorkSheet.Cells(1, 2) = "出力回数" NewWorkSheet.Cells(1, 3) = "文字列" NewWorkSheet.Cells(1, 4) = "キー番号" BaseWorkSheet.Activate BaseWorkSheet.Columns(16384).Clear '16384列目に処理済みフラグを記載するのでクリアする。 MaxRow = Range("A1").End(xlDown).Row m = 2 '最下行の文字列まで処理を実施 For i = 2 To MaxRow StartTime = Timer t = 0 'グループ化済みフラグがたっているものはスキップ If Cells(i, 16384) = "" Then '対象文字列の左から200文字を比較対象とする find_word = Left(Cells(i, 1), 200) find_word2 = Cells(i, 1) NewWorkSheet.Cells(m, 1) = "=ROW()-1" NewWorkSheet.Cells(m, 2) = 1 NewWorkSheet.Cells(m, 3) = find_word2 NewWorkSheet.Cells(m, 4) = m - 1 For ii = i + 1 To MaxRow '処理済みフラグがたっている場合はスキップ If Cells(ii, 16384) <> 1 Then diff_count = lsDist(find_word, Left(Cells(ii, 1), 200)) diff_Percent = diff_count / 100 '文字列差分が10%以内の場合、同一メッセージとしてみなす If diff_Percent < 0.1 Then NewWorkSheet.Cells(m, 2) = NewWorkSheet.Cells(m, 2) + 1 BaseWorkSheet.Cells(ii, 16384) = NewWorkSheet.Cells(m, 1) End If End If Next ii m = m + 1 End If Next i '体裁調整 MaxRow2 = NewWorkSheet.Range("A1").End(xlDown).Row NewWorkSheet.Range("A1:D" & MaxRow2).Borders.LineStyle = True NewWorkSheet.Range("A1:D1").Interior.Color = RGB(204, 255, 255) NewWorkSheet.Range("A2:D" & MaxRow2).Sort Key1:=NewWorkSheet.Cells(2, 2), order1:=xlDescending NewWorkSheet.Range("B2:C" & MaxRow2).NumberFormatLocal = "0 件" NewWorkSheet.Range("A:D").Columns.AutoFit Application.Calculation = xlCalculationAutomatic Application.ScreenUpdating = True Finish = Timer MsgBox ("かかった時間は" & Finish - Start & "秒です") End Sub '************************************************************************ '* (VBA)文字列の比較 / 第1引数:元のテキスト 第2引数:比較テキスト '************************************************************************ Public Function lsDist(baseText As String, tryText As String) As Integer Dim matrix() As Variant Dim i As Integer, j As Integer, cost As Integer lsDist = 0 If (baseText = tryText) Then Exit Function End If If (Len(baseText) = 0) Then lsDist = Len(tryText) Exit Function End If If (Len(tryText) = 0) Then lsDist = Len(baseText) Exit Function End If ReDim matrix(Len(baseText), Len(tryText)) For i = 0 To Len(baseText) matrix(i, 0) = i Next i For j = 0 To Len(tryText) matrix(0, j) = j Next j For i = 1 To Len(baseText) For j = 1 To Len(tryText) cost = IIf(Mid$(baseText, i, 1) = Mid$(tryText, j, 1), 0, 1) matrix(i, j) = WorksheetFunction.Min(matrix(i - 1, j) + 1, matrix(i, j - 1) + 1, matrix(i - 1, j - 1) + cost) Next j Next i lsDist = matrix(Len(baseText), Len(tryText)) End Function